CN107808146A - A kind of multi-modal emotion recognition sorting technique - Google Patents

A kind of multi-modal emotion recognition sorting technique Download PDF

Info

Publication number
CN107808146A
CN107808146A CN201711144196.1A CN201711144196A CN107808146A CN 107808146 A CN107808146 A CN 107808146A CN 201711144196 A CN201711144196 A CN 201711144196A CN 107808146 A CN107808146 A CN 107808146A
Authority
CN
China
Prior art keywords
space
time
probability matrix
facial image
image space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711144196.1A
Other languages
Chinese (zh)
Other versions
CN107808146B (en
Inventor
孙波
何珺
余乐军
曹斯铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN201711144196.1A priority Critical patent/CN107808146B/en
Publication of CN107808146A publication Critical patent/CN107808146A/en
Application granted granted Critical
Publication of CN107808146B publication Critical patent/CN107808146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of multi-modal emotion recognition sorting technique, methods described is included to the video comprising face to be detected and the video comprising body action is handled in the corresponding same time, it is transformed into the temporal sequence of images being made up of picture frame, extract the temporal characteristics and space characteristics in temporal sequence of images, more layer depth space-time characteristics based on acquisition, various features level fusion is carried out to feature, and decision level fusion is carried out to classification results, so as to the affective style of the task from multi-modal upper identification video to be detected, method provided by the invention, take full advantage of effective information present in each mode, improve the discrimination of emotion recognition.

Description

A kind of multi-modal emotion recognition sorting technique
Technical field
The present invention relates to computer processing technology field, more particularly, to a kind of multi-modal emotion recognition sorting technique.
Background technology
Emotion recognition is as multi-crossed disciplines such as computer science, cognitive science, psychology, brain science, Neuscience Emerging research field, its research purpose are exactly the emotional expression for allowing computer learning to understand the mankind, finally can be as the mankind Equally there is identification, understand the ability of emotion.Therefore, as a cross discipline for being rich in challenge, emotion recognition, which turns into, to be worked as Preceding pattern-recognition both at home and abroad, computer vision, big data is excavated and a study hotspot of artificial intelligence field, has important Researching value and application prospect.
In existing emotion recognition technology, the research tendency of emotion recognition show two it is more obvious the characteristics of, one Aspect, data expand to the emotion recognition based on dynamic image sequence by the emotion recognition based on still image;On the other hand, by Emotion recognition based on single mode is expanded to based on multi-modal emotion recognition.At present, the emotion recognition based on still image is ground Study carefully and have been achieved for a collection of good achievement, however, the emotion identification method based on static images have ignored human body expression when Between multidate information.As a whole, the relative emotion recognition based on picture, the accuracy of analysis of video data are also needed into traveling The research of one step.In addition, psychological study shows, emotion recognition is substantially multi-modal problem, utilizes body posture and face Expression judges that affective state Billy has more preferable effect with single mode information jointly.For single mode, multi-modal letter is utilized Breath is merged to identify that emotion can be more accurately and reliably.This causes multimodal information fusion also to develop into the one of emotion recognition field Individual study hotspot.
In the prior art, facial expression and the modality fusion method of body posture be all only with single amalgamation mode, Selected according to certain strategy from feature-based fusion or decision level fusion a kind of.In the prior art, can not be from video data Extract effective space-time characteristic and carry out emotion recognition, on the other hand, either merged using early stage or later stage, similar melts Conjunction method all has the characteristics of model is unrelated, does not make full use of effective information present in each mode, generally existing fusion effect The problem of rate is not high.
The content of the invention
To solve in the prior art, effective space-time characteristic can not be extracted from video data and carries out asking for emotion recognition Topic, and to either being merged in emotion recognition using early stage or later stage, similar fusion method all has model unrelated Feature, effective information present in each mode is not made full use of, the problem of generally existing fusion efficiencies are not high, there is provided Yi Zhongduo Mode emotion recognition sorting technique.
According to an aspect of the present invention, a kind of multi-modal emotion recognition sorting technique, including:
S1, receives testing data, and the testing data includes the video comprising face and included in the corresponding same time The video of body action, the video comprising face and the corresponding video comprising body action are pre-processed, obtained Facial image time series comprising face and the body image time series comprising body action;
S2, the facial image time series is sequentially inputted to the convolutional neural networks based on Alexnet and is based on In BLSTM Recognition with Recurrent Neural Network, the data of output are taken out, as the first facial image space-time characteristic, by the body image Time series is sequentially inputted in the convolutional neural networks based on Alexnet and the Recognition with Recurrent Neural Network based on BLSTM, is taken out defeated The data gone out, as the first body image space-time characteristic;
S3, the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to and connected entirely Connect in neutral net, after obtaining the first facial image space-time characteristic and the first body image space-time characteristic fusion, category In the probability matrix of different emotions type, this probability matrix is labeled as the first probability matrix, while by the first face figure Connect and be input in SVMs as space-time characteristic and the first body image space-time characteristic, obtain the first face figure After space-time characteristic and the first body image space-time characteristic series connection, belong to the probability matrix of different emotions type, this is general Rate matrix is labeled as the second probability matrix;
S4, the first facial image space-time characteristic is input in SVMs, obtains first facial image Space-time characteristic belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by described first Body image feature is input in SVMs, is obtained the first body image space-time characteristic and is belonged to different emotions type Probability matrix, this probability matrix is labeled as the 4th probability matrix, by first probability matrix, the second probability matrix, the 3rd Probability matrix and the 4th probability matrix carry out Decision fusion, obtain the first fusion probability matrix, probability square is merged by described first Probability highest emotion type is as emotion recognition result in battle array.
Wherein, also include before the step S1:To the convolutional neural networks based on Alexnet, based on BLSTM's Recognition with Recurrent Neural Network, full Connection Neural Network and SVMs are trained.
Wherein, the video comprising face and the corresponding video comprising body action are pre-processed in step S1 Specifically include:
Face datection and registration process are carried out to each two field picture in the video comprising face, by the image after processing Frame is sequentially arranged, and obtains the facial image time series;
Each two field picture in entering to the video comprising body action is normalized, by the image after processing Frame arranges sequentially in time, obtains body image time series.
Wherein, the step S1 further comprises:
The mark of each picture frame in the reading video comprising face, extraction is labeled as beginning, summit and disappearance Picture frame, form facial image time series;
Read the mark of each picture frame in the video comprising body action, extraction labeled as start, summit and The picture frame of disappearance, form body image time series;
Wherein, the mark of described image frame include calming down, start, summit and disappearance.
Wherein, the step S2 is specifically included:
S21, the facial image time series is input in the convolutional neural networks based on Alexnet, takes out three The data of the full articulamentum of the first two are carried out face space initial characteristicses as face space initial characteristicses in full articulamentum Principal component analysis, so as to realize space conversion and dimensionality reduction, the first facial image space characteristics are obtained, by the body image time Sequence inputting takes out the data of the full articulamentum of the first two in three full articulamentums into the convolutional neural networks based on Alexnet As body space initial characteristicses, the body space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and Dimensionality reduction, obtain the first body image space characteristics;
S22, the first facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out three The data of the full articulamentum of the first two are carried out the face space-time initial characteristicses as face space-time initial characteristicses in full articulamentum Principal component analysis, so as to realize space conversion and dimensionality reduction, the first facial image space-time characteristic is obtained, by first body image Space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out the data of the full articulamentum of the first two in three full articulamentums As body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize space conversion and Dimensionality reduction, obtain the first body image space-time characteristic.
Wherein, also include in the step S1:
By default sliding window length, the facial image time series and the body image time series are entered Row cutting, obtains the facial image time subsequence group being made up of multiple facial image time series fragments and multiple body images The body image time subsequence group of time series fragment composition.
Wherein, the step S2 further comprises:
Multiple facial image time series fragments in the facial image time subsequence group are sequentially inputted to be based on In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second face Image space-time characteristic;
Multiple body image time series fragments in the body image time subsequence group are sequentially inputted to be based on In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second body Image space-time characteristic.
Wherein, also include in the step S2:
Multiple facial image time serieses in the facial image time subsequence group are input to based on Alexnet's In convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are initially special as the second face space Sign, the second face space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, obtain the second people Face image space characteristics, multiple body image time serieses in the body image time subsequence group are input to and are based on In Alexnet convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are empty as the second body Between initial characteristicses, by the second body space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain Obtain the second body image space characteristics;
The second facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, three is taken out and connects entirely The data of the full articulamentum of the first two in layer are connect as the second face space-time initial characteristicses, the face space-time initial characteristicses are carried out Principal component analysis, so as to realize space conversion and dimensionality reduction, the second facial image space-time characteristic is obtained, by second body image Space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out the data of the full articulamentum of the first two in three full articulamentums As the second body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize that space turns Change and dimensionality reduction, obtain the second body image space-time characteristic.
Wherein, the step S3 further comprises:
The second facial image space-time characteristic and the second body image space-time characteristic series connection are input to full connection In neutral net, output result is input in SVMs, obtains the second facial image space-time characteristic and described the After the fusion of two body image space-time characteristics, belong to the probability matrix of different emotions type, this probability matrix is general labeled as the 5th Rate matrix, while the second facial image space-time characteristic and the second body image space-time characteristic series connection are input to support In vector machine, after obtaining the second facial image space-time characteristic and the second body image space-time characteristic fusion, belong to not The probability matrix of feeling of sympathy type, this probability matrix is labeled as the 6th probability matrix.
Wherein, the step S4 further comprises:
The first facial image space-time characteristic is input in SVMs, obtains the first facial image space-time Feature belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by first body Characteristics of image is input in SVMs, obtains the probability that the first body image space-time characteristic belongs to different emotions type Matrix, this probability matrix is labeled as the 4th probability matrix, by the 5th probability matrix, the 6th probability matrix, the 7th probability Matrix and the 8th probability matrix carry out Decision fusion, obtain the second fusion probability matrix;
Described first fusion probability matrix and the second fusion probability matrix are subjected to Decision fusion, obtain the 3rd fusion Probability matrix, using probability highest emotion type in the described 3rd fusion probability matrix as emotion recognition result.
Method provided by the invention, using the emotion identification method of multi-modal combination, take full advantage of in band detection video The effective information of various mode, fusion efficiencies are improved, while improve the accuracy to emotion recognition.
Brief description of the drawings
Fig. 1 is a kind of flow chart for multi-modal emotion recognition sorting technique that one embodiment of the invention provides;
Fig. 2 is to be used in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides based on time series The emotion recognition rate comparison diagram of different convergence strategies;
Fig. 3 is to space-time characteristic extraction in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides Neural network structure schematic diagram;
Fig. 4 is to time sequence in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides using sliding window The schematic diagram of column split;
Fig. 5 is to be based on time series fragment in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides Using the emotion recognition rate comparison diagram of different convergence strategies;
Fig. 6 be one embodiment of the invention provide a kind of multi-modal emotion recognition sorting technique in based on time series and when Between the emotion recognition rate comparison diagram that is merged of sequence fragment.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
With reference to figure 1, Fig. 1 is a kind of flow chart for multi-modal emotion recognition sorting technique that one embodiment of the invention provides, Methods described includes:
S1, receive testing data, and the testing data includes the video comprising face and corresponding comprising body action Video, the video comprising face and the corresponding video comprising body action are pre-processed, when obtaining facial image Between sequence and body image time series.
Specifically, by receiving comprising the video of countenance comprising people and regarding comprising body action in the same time Frequently, by after video pre-filtering, the video of video and body action to face arranges according to picture frame respectively, obtain by regarding The facial image time series and body image time series of picture frame composition in frequency.
By the method, video data is converted into picture frame sequence, improves the operability to data, it is convenient follow-up Data are handled.
S2, the facial image time series is sequentially inputted to the convolutional neural networks based on Alexnet and is based on In BLSTM Recognition with Recurrent Neural Network, the data of output are taken out, as the first facial image space-time characteristic, by the body image Time series is sequentially inputted in the convolutional neural networks based on Alexnet and the Recognition with Recurrent Neural Network based on BLSTM, is taken out defeated The data gone out, as the first body image space-time characteristic.
Specifically, the facial image time series obtained in S1 and body image time series are input to and trained respectively Based on Alexnet convolutional neural networks neutralize the Recognition with Recurrent Neural Network based on BLSTM in, by based on Alexnet convolution god The feature spatially of temporal sequence of images can be obtained from the time series through network, and can by Recognition with Recurrent Neural Network Further to obtain the feature in temporal sequence of images on space-time in the space characteristics of acquisition.In the present embodiment, pass through difference Facial image time series and body image time series are input in the convolutional neural networks based on Alexnet trained In the Recognition with Recurrent Neural Network based on BLSTM, space-time characteristic i.e. the first face of facial image Time-space serial can be obtained respectively Image space-time characteristic and the space-time characteristic of body image sequence are the first body image space-time characteristic.
By the method, the convolutional neural networks combined based on Alexnet and the circulation god based on BLSTM are built Depth network through network, the local and global space-time characteristic of extraction so that can according to more layer depth space-time characteristics of acquisition, The facial image time series and body image time series are classified.
S3, the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to and connected entirely Connect in neutral net, output result is input in SVMs, obtain the first facial image space-time characteristic and described After the fusion of first body image space-time characteristic, belong to the probability matrix of different emotions type, this probability matrix is labeled as first Probability matrix, while the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to branch Hold in vector machine, after obtaining the first facial image space-time characteristic and the first body image space-time characteristic series connection, belong to The probability matrix of different emotions type, this probability matrix is labeled as the second probability matrix.
Specifically, the first facial image space-time characteristic and the first body image space-time characteristic are connected, input Into the full Connection Neural Network trained, output result is input in the SVMs trained, can be according to described First facial image space-time characteristic and the first body image space-time characteristic both modalities which combination, obtain the first face figure As the assemblage characteristic of space-time characteristic and the first body image space-time characteristic belongs to the probability of different emotions classification, structure first Class probability matrix.
Wherein, in the output data of full Connection Neural Network, it is preferred that the data of inverted second full articulamentum are entered Row principal component analysis carries out dimensionality reduction, then by the data input after processing into the SVMs trained, to obtain precision more High probabilistic classification result.
On the other hand, by the way that the first facial image space-time characteristic and the first body image space-time characteristic are carried out Feature after series connection, is then input in the SVMs trained, it is hereby achieved that the first face figure by series connection As the assemblage characteristic of space-time characteristic and the first body image space-time characteristic belongs to the probability of different emotions classification, structure second Class probability matrix.
Wherein, by the cascade process of the first facial image space-time characteristic and the first body image space-time characteristic In, dimensionality reduction can be carried out by principal component analysis to the feature after series connection, then the feature after dimensionality reduction is input to the branch trained Hold in vector machine so as to obtain probability output.By the method, by feature-based fusion, feature and body action to face Feature is merged, and includes neural network fusion strategy and feature series connection convergence strategy, Ke Yifen using different convergence strategies Not Huo get video data belong to the probability matrix of different emotions classification.
By the method, by feature-based fusion, the feature of feature and body action to face merges, using not Same convergence strategy includes neural network fusion strategy and feature series connection convergence strategy, can obtain video data respectively and belong to not The probability matrix of feeling of sympathy classification.
S4, the first facial image space-time characteristic is input in SVMs, obtains first facial image Space-time characteristic belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by described first Body image feature is input in SVMs, is obtained the first body image space-time characteristic and is belonged to different emotions type Probability matrix, this probability matrix is labeled as the 4th probability matrix, by first probability matrix, the second probability matrix, the 3rd Probability matrix and the 4th probability matrix carry out Decision fusion, obtain the first fusion probability matrix, probability square is merged by described first Probability highest emotion type is as emotion recognition result in battle array.
Specifically, the first facial image space-time characteristic is individually entered in the SVMs trained, so as to The probability matrix that the first facial image space-time characteristic belongs to different emotions classification can be obtained, passes through this probability matrix, structure The 3rd probability matrix is built, on the other hand, the first body image space-time characteristic is individually entered to the supporting vector trained In machine, it is hereby achieved that the first body image space-time characteristic belongs to the probability matrix of different emotions classification, it is general by this Rate matrix builds the 4th probability matrix.
With reference to figure 2, Fig. 2 is to be based on the time in a kind of multi-modal emotion recognition sorting technique that one embodiment of the invention provides Sequence uses the emotion recognition rate comparison diagram of different convergence strategies, and four probability matrixs of acquisition are carried out into Decision fusion, obtained Probability matrix after new fusion, include the collection that testing data belongs to the probability of different emotional categories in the probability matrix Close, in this set, select probability highest emotional category is as final recognition result.
By the method, by the face image expression of people and in the same period, body action is combined, by making The space-time characteristic that testing data is carried out with deep neural network extracts, during by SVMs according to different convergence strategies pair Empty feature is classified, and so as to finally realize multi-modal emotion recognition, has been made full use of the effective information in each mode, has been carried Emotion recognition accuracy probability is risen.
On the basis of above-described embodiment, also include before the step S1:To the convolutional Neural based on Alexnet Network, the Recognition with Recurrent Neural Network based on BLSTM, full Connection Neural Network and SVMs are trained.
Specifically, by FABO databases, 127 videos are used for the convolutional neural networks based on Alexnet, are based on BLSTM Recognition with Recurrent Neural Network, full Connection Neural Network and SVMs are trained.
Have by using in people's face and body on the image sequence changed, to the convolutional neural networks based on Alexnet It is trained with the Recognition with Recurrent Neural Network based on BLSTM, adjusts network parameter, obtain Feature Selection Model.Use different faces The space-time characteristic of the space-time characteristic body posture of portion's activity is input in SVMs, sentiment classification model.
To the video comprising face and corresponding body action is included on the basis of above-described embodiment, in step S1 Video carry out pretreatment and specifically include:Each two field picture in the video comprising face is carried out at Face datection and alignment Reason, the picture frame after processing is sequentially arranged, obtains the facial image time series;Body action is included to described Video in each two field picture be normalized, the picture frame after processing is arranged sequentially in time, obtain body Temporal sequence of images.
Specifically, by carrying out Face datection operation and alignment to each picture frame in the video comprising face Processing, then by each two field picture after processing, is arranged sequentially in time, so as to obtain facial image time series, The picture frame in the video comprising body action is normalized simultaneously so that the form one of each two field picture frame Cause, then arranged the picture frame group after processing sequentially in time, form body image time series.
Pass through the method so that the form of each two field picture in facial image time series and body image time series It is identical, the operation such as convenient follow-up progress feature extraction.
On the basis of above-described embodiment, the step S1 further comprises:Read in the video comprising face The mark of each picture frame, extraction form facial image time series labeled as the picture frame of beginning, summit and disappearance;Read Take the mark of each picture frame in the video comprising body action, image of the extraction labeled as beginning, summit and disappearance Frame, form body image time series.Wherein, the mark of described image frame include calming down, start, summit and disappearance.
Specifically, in the database of testing data, each frame of video is all marked, and is opened in a facial expressions and acts All picture frames in stage beginning are marked as " starting ", and the period that maximum is reached in facial expressions and acts is labeled as " summit ", by table All picture frames are labeled as " end " in the period of feelings release, and other are not had into the picture frame mark that expression expressed It is designated as " calming down ".
During emotion recognition is carried out using facial image time series and body image time series, it can use The time series of image composition comprising all picture frames, it can also select to be used only in the period that facial expressions and acts reach maximum Picture frame composition time series, it is preferred that abandon facial expressions and acts start before and facial expressions and acts finish later picture frame, only Select the parts of images frame in facial expressions and acts start to finish to carry out classification processing, " beginning ", " summit " will be labeled as and " disappeared The picture frame of mistake " is extracted, makeup time sequence, and so as to lift overall accuracy of identification, table 1 is shown based on difference Picture frame extracting method under by face video carry out emotion recognition result, table 2 is shown to be carried based on different picture frames Take the result for carrying out emotion recognition under method by body action.
Table 1
Time series screening technique MAA (%) ACC (%)
Vertex sequence 55.90 56.84
Beginning-summit-disappearance sequence 57.56 61.11
Whole cycle all sequences 51.67 53.85
Table 2
Time series screening technique MAA (%) ACC (%)
Vertex sequence 45.88 50.60
Beginning-summit-disappearance sequence 48.98 51.70
Whole cycle all sequences 44.50 49.77
By Tables 1 and 2 as can be seen that mark is starts in the selecting video ", the picture frame of " summit " and " disappearance " Makeup time sequence carries out emotion recognition and possesses higher discrimination compared to other schemes.Wherein, MAA represents the average standard of macroscopic view True rate, ACC represent overall accuracy rate, and calculation formula is specially:
Pi=TPi/(TPi+FPi)
In formula, s refers to emotional category number, PiRefer to the precision of the i-th class emotion, i refers to correctly to classify in the i-th class Number, FPiRefer to the number of mistake classification in the i-th class.
On the basis of above-described embodiment, the step S2 is specifically included:
S21, the facial image time series is input in the convolutional neural networks based on Alexnet, takes out three The data of the full articulamentum of the first two are carried out face space initial characteristicses as face space initial characteristicses in full articulamentum Principal component analysis, so as to realize space conversion and dimensionality reduction, the first facial image space characteristics are obtained, by the body image time Sequence inputting takes out the data of the full articulamentum of the first two in three full articulamentums into the convolutional neural networks based on Alexnet As body space initial characteristicses, the body space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and Dimensionality reduction, obtain the first body image space characteristics;
S22, the first facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network respectively, taken out The data of the full articulamentum of the first two are as face space-time initial characteristicses in three full articulamentums, by the face space-time initial characteristicses Principal component analysis is carried out, so as to realize space conversion and dimensionality reduction, the first facial image space-time characteristic is obtained, by first body Image space feature is input to based in BLSTM Recognition with Recurrent Neural Network, takes out the full articulamentum of the first two in three full articulamentums The body space-time initial characteristicses are carried out principal component analysis, so as to realize that space turns by data as body space-time initial characteristicses Change and dimensionality reduction, obtain the first body image space-time characteristic.
Specifically, with reference to figure 3, in order to obtain more layer depths in facial image time series and body image time series Space-time characteristic, it is necessary to realized the feature extraction on image space by means of convolutional neural networks, further using circulation Temporal information in neutral net extraction image sequence, in the present embodiment, by using the convolutional Neural net based on Alexnet Network, the space characteristics in facial image time series and body image time series are extracted respectively, it is preferred that based on In Alexnet convolutional neural networks, last three layers are all full articulamentum, and the intrinsic dimensionality of output is respectively 1024 dimensions, 512 dimensions With 10 dimensions, here using first 2 layers in three full articulamentums of output data as the initial space feature for going out output, extract herein Initial characteristicses dimension one share 1536 dimensions, by this 1536 dimensional feature carry out principal component analysis, so as to realize space conversion and drop Dimension processing so that latitude reaches the input standard of the Recognition with Recurrent Neural Network based on BLSTM, then by last three full articulamentums First 2 layers of output data is extracted as initial space-time characteristic, wherein initial space-time characteristic is also 1536 dimensions, then to initial space-time characteristic 1536 dimensional feature points carry out principal component analysis, so as to realize space conversion and dimension-reduction treatment, finally obtain space-time characteristic.At this In one step, by the convolutional neural networks based on Alexnet that are sequentially inputted to train by facial image time series and The Recognition with Recurrent Neural Network based on BLSTM trained, so as to obtain facial image space-time characteristic, likewise, during by body image Between the sequence convolutional neural networks based on Alexnet for being sequentially inputted to train and the god of the circulation based on BLSTM trained Through network, so as to obtain body image space-time characteristic, labeled as the first facial image space-time characteristic and the first body image space-time Feature.
By the method, realize and the extraction of space characteristics and the extraction of temporal characteristics are carried out to the time series of image.
On the basis of the various embodiments described above, the step S1 also includes:By default sliding window length, to described Facial image time series and the body image time series are cut, and are obtained by multiple facial image time series fragments The body image time subsequence of the facial image time subsequence group of composition and multiple body image time series fragments composition Group.
Specifically, after facial image time series and body image time series is obtained, window has been preset by one The sliding window of mouth length, cuts to time series, as shown in figure 4, in the facial image time series that a length is 15, Include that 5 two field picture frame flags are " beginning ", 5 two field picture frame flags are " summit ", 5 two field pictures are labeled as " disappearance ", by setting It is 6 to put length, and sliding step is 1 sliding window, and sequence is cut, and the facial image time series that length is 15 is by above-mentioned The facial image time series fragment that 10 length are 6 can be obtained after the sliding window of setting, forms facial image time subsequence The length of group, wherein sliding window, which tries one's best to be defined into ensure to cut in obtained time series fragment, includes " beginning ", " summit " The picture frame of at least two types, also cuts to body image time series in the picture frame of " end " three types Cut, the body image time series fragment obtained after cutting is formed into body fractional time subsequence group.
Table 3 shows the emotion recognition result carried out under different sliding window length based on facial image time series, table 4 Show the emotion recognition result carried out under different sliding window length based on body image time series.
Table 3
t 6 7 8 9 10
MAA (%) 58.61 60.45 67.09 58.48 56.13
ACC (%) 59.00 61.25 66.46 59.03 57.21
Table 4
t 6 7 8 9 10
MAA (%) 43.66 55.00 50.20 47.33 45.81
ACC (%) 44.85 55.98 51.83 48.76 46.00
By table 3 and table 4 as can be seen that when sliding window length selects suitable length, the accurate rate of identification is higher than The emotion recognition mode cut using whole time series in Tables 1 and 2 without time series.
On the basis of the various embodiments described above, the step S2 further comprises:By the facial image chronon sequence Multiple facial image time series fragments in row group are sequentially inputted to the convolutional neural networks based on Alexnet and are based on In BLSTM Recognition with Recurrent Neural Network, the second facial image space-time characteristic is obtained;By in the body image time subsequence group Multiple body image time series fragments are sequentially inputted to the convolutional neural networks based on Alexnet and the circulation based on BLSTM In neutral net, the second body image space-time characteristic is obtained.
Specifically, by multiple facial image time series fragments in the facial image time subsequence group and the body Multiple body image time series fragments in body image temporal subsequence group be also fed to train based on Alexnet's In convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, institute is obtained in facial image time subsequence group respectively sometimes Between the space-time characteristic of sequence fragment and the space-time characteristic of all time series fragments in body action image temporal subsequence group, mark It is designated as the second facial image space-time characteristic and the second body action image space-time characteristic.
By the method, feature extraction is entered to multiple time series fragments after cutting, when can obtain new facial image Empty feature and new body action image space-time characteristic, for being classified to grader.
On the basis of the various embodiments described above, also include in the step S2:
Multiple facial image time serieses in the facial image time subsequence group are input to based on Alexnet's In convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are initially special as the second face space Sign, the second face space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, obtain the second people Face image space characteristics, multiple body image time serieses in the body image time subsequence group are input to and are based on In Alexnet convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are empty as the second body Between initial characteristicses, by the second body space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain Obtain the second body image space characteristics;
The second facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, three is taken out and connects entirely The data of the full articulamentum of the first two in layer are connect as the second face space-time initial characteristicses, the face space-time initial characteristicses are carried out Principal component analysis, so as to realize space conversion and dimensionality reduction, the second facial image space-time characteristic is obtained, by second body image Space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out the data of the full articulamentum of the first two in three full articulamentums As the second body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize that space turns Change and dimensionality reduction, obtain the second body image space-time characteristic.
Specifically, the method one with extracting the first face space-time characteristic and the first body space-time characteristic in above-described embodiment Cause, in order to obtain the space-time characteristic of more layer depths in facial image time series and body image time series, it is necessary to by The feature extraction on image space is realized in convolutional neural networks, is further extracted using Recognition with Recurrent Neural Network in image Temporal information, in the present embodiment, by using the convolutional neural networks based on Alexnet and the circulation nerve net based on BLSTM Network has carried out the space-time characteristic for all time series fragments that sliding window is cut to extract, so as to extract the second face space-time spy Seek peace the second body space-time characteristic.It is same as the previously described embodiments to the extracting mode of feature in neutral net herein, herein no longer Repeat.
On the basis of the various embodiments described above, the step S3 further comprises:By the second facial image space-time Feature and the second body image space-time characteristic series connection are input in full Connection Neural Network, and output result is input into support In vector machine, after obtaining the second facial image space-time characteristic and the second body image space-time characteristic fusion, belong to not The probability matrix of feeling of sympathy type, this probability matrix is labeled as the 5th probability matrix, while during by second facial image Empty feature and the second body image space-time characteristic series connection are input in SVMs, when obtaining second facial image After empty feature and the second body image space-time characteristic series connection, belong to the probability matrix of different emotions type, by this probability square Battle array is labeled as the 6th probability matrix.
Specifically, the second facial image space-time characteristic and the second body image space-time characteristic are connected, input Into the full Connection Neural Network trained, using the data of the full articulamentum of penultimate in full Connection Neural Network as output Data, after carrying out principal component analysis, it is input in the SVMs trained, so as to according to the second facial image space-time Feature and the second body image space-time characteristic both modalities which combination, obtain the second facial image space-time characteristic and described Second body image space-time characteristic belongs to the probability of different emotions classification, builds the 5th class probability matrix.
On the other hand, by the way that the second facial image space-time characteristic and the second body image space-time characteristic are carried out Feature after series connection, is then input in the SVMs trained by series connection, it is hereby achieved that the second people after series connection Face image space-time characteristic and the second body image space-time characteristic belong to the probability of different emotions classification, by this probabilistic combination, Build the 6th class probability matrix.
On the basis of above-described embodiment, the step S4 further comprises:The first facial image space-time is special Sign is input in SVMs, obtains the probability matrix that the first facial image space-time characteristic belongs to different emotions type, This probability matrix is labeled as the 7th probability matrix, the first body image feature is input in SVMs, is obtained The first body image space-time characteristic belongs to the probability matrix of different emotions type, and this probability matrix is labeled as into the 8th probability Matrix, the 5th probability matrix, the 6th probability matrix, the 7th probability matrix and the 8th probability matrix are subjected to Decision fusion, Obtain the second fusion probability matrix;Described first fusion probability matrix and the second fusion probability matrix are carried out decision-making and melted Close, obtain the 3rd fusion probability matrix, merge probability highest emotion type in probability matrix using the described 3rd identifies as emotion As a result
Specifically, the second facial image space-time characteristic is individually entered in the SVMs trained, so as to The probability matrix that the second facial image space-time characteristic belongs to different emotions classification can be obtained, this probability matrix is labeled as 7th general probability matrix, on the other hand, the second body image space-time characteristic is individually entered to the supporting vector trained In machine, it is hereby achieved that the second body image space-time characteristic belongs to the probability matrix of different emotions classification, by this probability Matrix is labeled as the 8th probability matrix.
With reference to figure 5, as can be seen from Figure 5 based on the 5th probability matrix, the 6th probability matrix, the 7th probability matrix and The discrimination contrast of emotion recognition is carried out in eight probability matrixs, by the 5th probability matrix, the 6th probability matrix, the 7th probability Matrix and the 8th probability matrix carry out Decision fusion, the fusion probability matrix of generation second, can reach in Fig. 5 and show in Multi4-2 The affective style discrimination gone out.
Enter finally by described first fusion probability matrix is merged into probability matrix with described second further according to Probabilistic Decision-making Row decision level fusion, obtain the 3rd fusion probability matrix, in this set, select probability highest emotional category as finally Recognition result.With reference to figure 6, Fig. 6 is shown in the first fusion probability matrix to the discrimination of emotion, and the second fusion probability matrix In to the discrimination of emotion, using whole time series and carrying out sliding window in the discrimination of emotion and the 3rd fusion probability Time series fragment group after cutting carries out after emotion recognition again merged recognition result, to obtain accuracy and existing respectively More than 99% emotion recognition.
By the method, using the emotion identification method of multi-modal combination, various moulds in band detection video are taken full advantage of The effective information of state, fusion efficiencies are improved, while improve the accuracy to emotion recognition.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc., the protection of the present invention should be included in Within the scope of.

Claims (10)

  1. A kind of 1. multi-modal emotion recognition sorting technique, it is characterised in that including:
    S1, receives testing data, and the testing data includes the video comprising face and body is included in the corresponding same time The video of action, the video comprising face and the corresponding video comprising body action are pre-processed, comprising The facial image time series of face and the body image time series comprising body action;
    S2, the facial image time series is sequentially inputted to convolutional neural networks based on Alexnet and based on BLSTM's In Recognition with Recurrent Neural Network, the data of output are taken out, as the first facial image space-time characteristic, by the body image time series It is sequentially inputted in the convolutional neural networks based on Alexnet and the Recognition with Recurrent Neural Network based on BLSTM, takes out the number of output According to as the first body image space-time characteristic;
    S3, the first facial image space-time characteristic and the first body image space-time characteristic series connection are input to full connection god Through in network, output result is input in SVMs, the first facial image space-time characteristic and described first are obtained After the fusion of body image space-time characteristic, belong to the probability matrix of different emotions type, this probability matrix is labeled as the first probability Matrix, at the same by the first facial image space-time characteristic and the first body image space-time characteristic series connection be input to support to In amount machine, after obtaining the first facial image space-time characteristic and the first body image space-time characteristic series connection, belong to different The probability matrix of affective style, this probability matrix is labeled as the second probability matrix;
    S4, the first facial image space-time characteristic is input in SVMs, obtains the first facial image space-time Feature belongs to the probability matrix of different emotions type, this probability matrix is labeled as into the 3rd probability matrix, by first body Characteristics of image is input in SVMs, obtains the probability that the first body image space-time characteristic belongs to different emotions type Matrix, this probability matrix is labeled as the 4th probability matrix, by first probability matrix, the second probability matrix, the 3rd probability Matrix and the 4th probability matrix carry out Decision fusion, obtain the first fusion probability matrix, described first is merged in probability matrix Probability highest emotion type is as emotion recognition result.
  2. 2. according to the method for claim 1, it is characterised in that also include before the step S1:It is based on to described Alexnet convolutional neural networks, the Recognition with Recurrent Neural Network based on BLSTM, full Connection Neural Network and SVMs enter Row training.
  3. 3. according to the method for claim 1, it is characterised in that to the video comprising face and corresponding in step S1 Video comprising body action carries out pretreatment and specifically included:
    Face datection and registration process are carried out to each two field picture in the video comprising face, the picture frame after processing is pressed Time sequencing arranges, and obtains the facial image time series;
    Each two field picture in entering to the video comprising body action is normalized, and the picture frame after processing is pressed Arranged according to time sequencing, obtain body image time series.
  4. 4. according to the method for claim 3, it is characterised in that the step S1 further comprises:
    The mark of each picture frame, figure of the extraction labeled as beginning, summit and disappearance in the reading video comprising face As frame, facial image time series is formed;
    The mark of each picture frame in the reading video comprising body action, extraction is labeled as beginning, summit and disappearance Picture frame, form body image time series;
    Wherein, the mark of described image frame include calming down, start, summit and disappearance.
  5. 5. according to the method for claim 1, it is characterised in that the step S2 is specifically included:
    S21, the facial image time series is input in the convolutional neural networks based on Alexnet, takes out three and connect entirely The data of the full articulamentum of the first two in layer are connect as face space initial characteristicses, by face space initial characteristicses carry out it is main into Analysis, so as to realize space conversion and dimensionality reduction, the first facial image space characteristics are obtained, by the body image time series It is input in the convolutional neural networks based on Alexnet, takes out the data conduct of the full articulamentum of the first two in three full articulamentums Body space initial characteristicses, the body space initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, Obtain the first body image space characteristics;
    S22, the first facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, three is taken out and connects entirely The data of the full articulamentum of the first two in layer are connect as face space-time initial characteristicses, by the face space-time initial characteristicses carry out it is main into Analysis, so as to realize space conversion and dimensionality reduction, the first facial image space-time characteristic is obtained, by the first body image space Feature is input to based in BLSTM Recognition with Recurrent Neural Network, takes out the data conduct of the full articulamentum of the first two in three full articulamentums Body space-time initial characteristicses, the body space-time initial characteristicses are subjected to principal component analysis, so as to realize space conversion and dimensionality reduction, Obtain the first body image space-time characteristic.
  6. 6. according to any described method in claim 1-5, it is characterised in that also include in the step S1:
    By default sliding window length, the facial image time series and the body image time series are cut Cut, obtain the facial image time subsequence group being made up of multiple facial image time series fragments and multiple body image times The body image time subsequence group of sequence fragment composition.
  7. 7. according to the method for claim 6, it is characterised in that the step S2 further comprises:
    Multiple facial image time series fragments in the facial image time subsequence group are sequentially inputted to be based on In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second face Image space-time characteristic;
    Multiple body image time series fragments in the body image time subsequence group are sequentially inputted to be based on In Alexnet convolutional neural networks and Recognition with Recurrent Neural Network based on BLSTM, the data of output are taken out, as the second body Image space-time characteristic.
  8. 8. according to the method for claim 7, it is characterised in that also include in the step S2:
    Multiple facial image time serieses in the facial image time subsequence group are input to the convolution based on Alexnet In neutral net, the data of the full articulamentum of the first two in three full articulamentums are taken out as the second face space initial characteristicses, will The second face space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain the second face figure Image space feature, multiple body image time serieses in the body image time subsequence group are input to and are based on In Alexnet convolutional neural networks, the data for taking out the full articulamentum of the first two in three full articulamentums are empty as the second body Between initial characteristicses, by the second body space initial characteristicses carry out principal component analysis, so as to realize space conversion and dimensionality reduction, obtain Obtain the second body image space characteristics;
    The second facial image space characteristics are input to based in BLSTM Recognition with Recurrent Neural Network, take out three full articulamentums The data of the middle full articulamentum of the first two as the second face space-time initial characteristicses, by the face space-time initial characteristicses carry out it is main into Analysis, so as to realize space conversion and dimensionality reduction, the second facial image space-time characteristic is obtained, by the second body image space Feature is input to based in BLSTM Recognition with Recurrent Neural Network, takes out the data conduct of the full articulamentum of the first two in three full articulamentums Second body space-time initial characteristicses, by the body space-time initial characteristicses carry out principal component analysis, so as to realize space conversion and Dimensionality reduction, obtain the second body image space-time characteristic.
  9. 9. according to the method for claim 8, it is characterised in that the step S3 further comprises:
    The second facial image space-time characteristic and the second body image space-time characteristic series connection are input to full connection nerve In network, output result is input in SVMs, obtains the second facial image space-time characteristic and second body After the fusion of body image space-time characteristic, belong to the probability matrix of different emotions type, this probability matrix is labeled as the 5th probability square Battle array, while the second facial image space-time characteristic and the second body image space-time characteristic series connection are input to supporting vector In machine, after obtaining the second facial image space-time characteristic and the second body image space-time characteristic series connection, belong to and do not sympathize with Feel the probability matrix of type, this probability matrix is labeled as the 6th probability matrix.
  10. 10. according to the method for claim 9, it is characterised in that the step S4 further comprises:
    The first facial image space-time characteristic is input in SVMs, obtains the first facial image space-time characteristic Belong to the probability matrix of different emotions type, this probability matrix is labeled as the 3rd probability matrix, by first body image Feature is input in SVMs, obtains the probability square that the first body image space-time characteristic belongs to different emotions type Battle array, is labeled as the 4th probability matrix, by the 5th probability matrix, the 6th probability matrix, the 7th probability square by this probability matrix Battle array and the 8th probability matrix carry out Decision fusion, obtain the second fusion probability matrix;
    Described first fusion probability matrix and the second fusion probability matrix are subjected to Decision fusion, obtain the 3rd fusion probability Matrix, using probability highest emotion type in the described 3rd fusion probability matrix as emotion recognition result.
CN201711144196.1A 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method Active CN107808146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711144196.1A CN107808146B (en) 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711144196.1A CN107808146B (en) 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method

Publications (2)

Publication Number Publication Date
CN107808146A true CN107808146A (en) 2018-03-16
CN107808146B CN107808146B (en) 2020-05-05

Family

ID=61589748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711144196.1A Active CN107808146B (en) 2017-11-17 2017-11-17 Multi-mode emotion recognition and classification method

Country Status (1)

Country Link
CN (1) CN107808146B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491720A (en) * 2018-03-20 2018-09-04 腾讯科技(深圳)有限公司 A kind of application and identification method, system and relevant device
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
CN108596039A (en) * 2018-03-29 2018-09-28 南京邮电大学 A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks
CN109101999A (en) * 2018-07-16 2018-12-28 华东师范大学 The credible decision-making technique of association's neural network based on support vector machines
CN109190514A (en) * 2018-08-14 2019-01-11 电子科技大学 Face character recognition methods and system based on two-way shot and long term memory network
CN109325457A (en) * 2018-09-30 2019-02-12 合肥工业大学 Sentiment analysis method and system based on multi-channel data and Recognition with Recurrent Neural Network
CN109359599A (en) * 2018-10-19 2019-02-19 昆山杜克大学 Human facial expression recognition method based on combination learning identity and emotion information
CN109522945A (en) * 2018-10-31 2019-03-26 中国科学院深圳先进技术研究院 One kind of groups emotion identification method, device, smart machine and storage medium
CN109684911A (en) * 2018-10-30 2019-04-26 百度在线网络技术(北京)有限公司 Expression recognition method, device, electronic equipment and storage medium
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product
CN109783684A (en) * 2019-01-25 2019-05-21 科大讯飞股份有限公司 A kind of emotion identification method of video, device, equipment and readable storage medium storing program for executing
CN110020596A (en) * 2019-02-21 2019-07-16 北京大学 A kind of video content localization method based on Fusion Features and cascade study
CN110037693A (en) * 2019-04-24 2019-07-23 中央民族大学 A kind of mood classification method based on facial expression and EEG
CN110234018A (en) * 2019-07-09 2019-09-13 腾讯科技(深圳)有限公司 Multimedia content description generation method, training method, device, equipment and medium
CN110287912A (en) * 2019-06-28 2019-09-27 广东工业大学 Method, apparatus and medium are determined based on the target object affective state of deep learning
CN110378335A (en) * 2019-06-17 2019-10-25 杭州电子科技大学 A kind of information analysis method neural network based and model
CN110472506A (en) * 2019-07-11 2019-11-19 广东工业大学 A kind of gesture identification method based on support vector machines and Neural Network Optimization
CN110598608A (en) * 2019-09-02 2019-12-20 中国航天员科研训练中心 Non-contact and contact cooperative psychological and physiological state intelligent monitoring system
CN110693508A (en) * 2019-09-02 2020-01-17 中国航天员科研训练中心 Multi-channel cooperative psychophysiological active sensing method and service robot
CN110765839A (en) * 2019-09-02 2020-02-07 合肥工业大学 Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image
CN110795973A (en) * 2018-08-03 2020-02-14 北京大学 Multi-mode fusion action recognition method and device and computer readable storage medium
CN111242155A (en) * 2019-10-08 2020-06-05 台州学院 Bimodal emotion recognition method based on multimode deep learning
CN111476217A (en) * 2020-05-27 2020-07-31 上海乂学教育科技有限公司 Intelligent learning system and method based on emotion recognition
CN111914742A (en) * 2020-07-31 2020-11-10 辽宁工业大学 Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112784730A (en) * 2021-01-20 2021-05-11 东南大学 Multi-modal emotion recognition method based on time domain convolutional network
CN116682168A (en) * 2023-08-04 2023-09-01 阳光学院 Multi-modal expression recognition method, medium and system
CN117351575A (en) * 2023-12-05 2024-01-05 北京师范大学珠海校区 Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968643A (en) * 2012-11-16 2013-03-13 华中科技大学 Multi-mode emotion recognition method based on Lie group theory
CN106529504A (en) * 2016-12-02 2017-03-22 合肥工业大学 Dual-mode video emotion recognition method with composite spatial-temporal characteristic
CN107273876A (en) * 2017-07-18 2017-10-20 山东大学 A kind of micro- expression automatic identifying method of ' the grand micro- transformation models of to ' based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968643A (en) * 2012-11-16 2013-03-13 华中科技大学 Multi-mode emotion recognition method based on Lie group theory
CN106529504A (en) * 2016-12-02 2017-03-22 合肥工业大学 Dual-mode video emotion recognition method with composite spatial-temporal characteristic
CN107273876A (en) * 2017-07-18 2017-10-20 山东大学 A kind of micro- expression automatic identifying method of ' the grand micro- transformation models of to ' based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HE JUN: "《Multi View Facial Action Unit Detection Based on CNN and BLSTM-RNN》", 《2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION》 *
闫静杰 等: "《表情和姿态的双模态情感识别》", 《中国图象图形学报》 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491720A (en) * 2018-03-20 2018-09-04 腾讯科技(深圳)有限公司 A kind of application and identification method, system and relevant device
CN108491720B (en) * 2018-03-20 2023-07-14 腾讯科技(深圳)有限公司 Application identification method, system and related equipment
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
CN108596039A (en) * 2018-03-29 2018-09-28 南京邮电大学 A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks
CN108596039B (en) * 2018-03-29 2020-05-05 南京邮电大学 Bimodal emotion recognition method and system based on 3D convolutional neural network
CN109101999A (en) * 2018-07-16 2018-12-28 华东师范大学 The credible decision-making technique of association's neural network based on support vector machines
CN109101999B (en) * 2018-07-16 2021-06-25 华东师范大学 Support vector machine-based cooperative neural network credible decision method
CN110795973A (en) * 2018-08-03 2020-02-14 北京大学 Multi-mode fusion action recognition method and device and computer readable storage medium
CN109190514A (en) * 2018-08-14 2019-01-11 电子科技大学 Face character recognition methods and system based on two-way shot and long term memory network
CN109190514B (en) * 2018-08-14 2021-10-01 电子科技大学 Face attribute recognition method and system based on bidirectional long-short term memory network
CN109325457A (en) * 2018-09-30 2019-02-12 合肥工业大学 Sentiment analysis method and system based on multi-channel data and Recognition with Recurrent Neural Network
CN109359599A (en) * 2018-10-19 2019-02-19 昆山杜克大学 Human facial expression recognition method based on combination learning identity and emotion information
CN109684911A (en) * 2018-10-30 2019-04-26 百度在线网络技术(北京)有限公司 Expression recognition method, device, electronic equipment and storage medium
US11151363B2 (en) 2018-10-30 2021-10-19 Baidu Online Network Technology (Beijing) Co., Ltd. Expression recognition method, apparatus, electronic device, and storage medium
CN109522945A (en) * 2018-10-31 2019-03-26 中国科学院深圳先进技术研究院 One kind of groups emotion identification method, device, smart machine and storage medium
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product
CN109783684A (en) * 2019-01-25 2019-05-21 科大讯飞股份有限公司 A kind of emotion identification method of video, device, equipment and readable storage medium storing program for executing
CN110020596A (en) * 2019-02-21 2019-07-16 北京大学 A kind of video content localization method based on Fusion Features and cascade study
CN110037693A (en) * 2019-04-24 2019-07-23 中央民族大学 A kind of mood classification method based on facial expression and EEG
CN110378335A (en) * 2019-06-17 2019-10-25 杭州电子科技大学 A kind of information analysis method neural network based and model
CN110287912A (en) * 2019-06-28 2019-09-27 广东工业大学 Method, apparatus and medium are determined based on the target object affective state of deep learning
CN110234018A (en) * 2019-07-09 2019-09-13 腾讯科技(深圳)有限公司 Multimedia content description generation method, training method, device, equipment and medium
CN110472506A (en) * 2019-07-11 2019-11-19 广东工业大学 A kind of gesture identification method based on support vector machines and Neural Network Optimization
CN110765839B (en) * 2019-09-02 2022-02-22 合肥工业大学 Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image
CN110598608A (en) * 2019-09-02 2019-12-20 中国航天员科研训练中心 Non-contact and contact cooperative psychological and physiological state intelligent monitoring system
CN110693508A (en) * 2019-09-02 2020-01-17 中国航天员科研训练中心 Multi-channel cooperative psychophysiological active sensing method and service robot
CN110598608B (en) * 2019-09-02 2022-01-14 中国航天员科研训练中心 Non-contact and contact cooperative psychological and physiological state intelligent monitoring system
CN110765839A (en) * 2019-09-02 2020-02-07 合肥工业大学 Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image
CN111242155A (en) * 2019-10-08 2020-06-05 台州学院 Bimodal emotion recognition method based on multimode deep learning
CN111476217A (en) * 2020-05-27 2020-07-31 上海乂学教育科技有限公司 Intelligent learning system and method based on emotion recognition
CN111914742A (en) * 2020-07-31 2020-11-10 辽宁工业大学 Attendance checking method, system, terminal equipment and medium based on multi-mode biological characteristics
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112784730A (en) * 2021-01-20 2021-05-11 东南大学 Multi-modal emotion recognition method based on time domain convolutional network
CN112784730B (en) * 2021-01-20 2022-03-29 东南大学 Multi-modal emotion recognition method based on time domain convolutional network
CN116682168A (en) * 2023-08-04 2023-09-01 阳光学院 Multi-modal expression recognition method, medium and system
CN116682168B (en) * 2023-08-04 2023-10-17 阳光学院 Multi-modal expression recognition method, medium and system
CN117351575A (en) * 2023-12-05 2024-01-05 北京师范大学珠海校区 Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model
CN117351575B (en) * 2023-12-05 2024-02-27 北京师范大学珠海校区 Nonverbal behavior recognition method and nonverbal behavior recognition device based on text-generated graph data enhancement model

Also Published As

Publication number Publication date
CN107808146B (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN107808146A (en) A kind of multi-modal emotion recognition sorting technique
CN108596039B (en) Bimodal emotion recognition method and system based on 3D convolutional neural network
CN106203395B (en) Face attribute recognition method based on multitask deep learning
CN104679863B (en) It is a kind of based on deep learning to scheme to search drawing method and system
US20180150719A1 (en) Automatically computing emotions aroused from images through shape modeling
CN105999670A (en) Shadow-boxing movement judging and guiding system based on kinect and guiding method adopted by same
CN107609572A (en) Multi-modal emotion identification method, system based on neutral net and transfer learning
CN106203356B (en) A kind of face identification method based on convolutional network feature extraction
CN108764065A (en) A kind of method of pedestrian's weight identification feature fusion assisted learning
CN110464366A (en) A kind of Emotion identification method, system and storage medium
CN108491077A (en) A kind of surface electromyogram signal gesture identification method for convolutional neural networks of being divided and ruled based on multithread
CN106845513B (en) Manpower detector and method based on condition random forest
CN112906631B (en) Dangerous driving behavior detection method and detection system based on video
CN107330393A (en) A kind of neonatal pain expression recognition method based on video analysis
CN112036276A (en) Artificial intelligent video question-answering method
CN105956570A (en) Lip characteristic and deep learning based smiling face recognition method
Lyu et al. Spontaneous facial expression database of learners’ academic emotions in online learning with hand occlusion
CN115862120A (en) Separable variation self-encoder decoupled face action unit identification method and equipment
CN113486752A (en) Emotion identification method and system based on electrocardiosignals
CN106529453A (en) Reinforcement patch and multi-tag learning combination-based expression lie test method
CN108511064A (en) The system for automatically analyzing healthy data based on deep learning
KR20110098286A (en) Self health diagnosis system of oriental medicine using fuzzy inference method
CN113506274B (en) Detection system for human cognitive condition based on visual saliency difference map
Alam et al. An Autism Detection Architecture with Fusion of Feature Extraction and Classification
Javed et al. Behavior-based risk detection of autism spectrum disorder through child-robot interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant