CN106503646A - Multi-modal emotion identification system and method - Google Patents
Multi-modal emotion identification system and method Download PDFInfo
- Publication number
- CN106503646A CN106503646A CN201610912302.5A CN201610912302A CN106503646A CN 106503646 A CN106503646 A CN 106503646A CN 201610912302 A CN201610912302 A CN 201610912302A CN 106503646 A CN106503646 A CN 106503646A
- Authority
- CN
- China
- Prior art keywords
- emotion
- identification result
- emotion identification
- voice signal
- subsystem
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 343
- 238000000034 method Methods 0.000 title abstract description 7
- 230000000007 visual effect Effects 0.000 claims abstract description 55
- 238000001228 spectrum Methods 0.000 claims description 50
- 230000008909 emotion recognition Effects 0.000 claims description 37
- 230000008921 facial expression Effects 0.000 claims description 24
- 230000006399 behavior Effects 0.000 claims description 18
- 238000013507 mapping Methods 0.000 claims description 16
- 230000000392 somatic effect Effects 0.000 claims description 11
- 230000003595 spectral effect Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 238000012850 discrimination method Methods 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 210000005036 nerve Anatomy 0.000 claims description 6
- 230000008447 perception Effects 0.000 claims description 6
- 210000004556 brain Anatomy 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1807—Speech classification or search using natural language modelling using prosody or stress
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Acoustics & Sound (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Life Sciences & Earth Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present invention provides a kind of multi-modal emotion identification system and method, and wherein, system includes, voice receiver, and the first emotion recognizes subsystem, and the second emotion recognizes subsystem, and visual pattern receptor, the 3rd emotion recognize subsystem, emotion follower;Voice receiver, for receiving the voice signal that destination object sends;Visual pattern receptor, for receiving the visual image data with regard to destination object;First emotion recognizes subsystem, for obtaining the first emotion identification result according to voice signal;Second emotion recognizes subsystem, for obtaining the second emotion identification result according to voice signal;3rd emotion recognizes subsystem, for obtaining the 3rd emotion identification result according to visual image data;Emotion follower, for according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result determine the affective state of destination object.
Description
Technical field
The present invention relates to computer processing technology, more particularly to a kind of multi-modal emotion identification system and method.
Background technology
At present, emotion identification machine is generally by adopting character recognition technology, speech recognition technology or visual pattern identification
A certain technology in technology is recognized to the emotion of the mankind, and this emotion identification mode is single, carries out emotion identification when institute
Using quantity of information few, it is difficult to realize the human emotion's identification under complicated situation.
Content of the invention
The technical problem to be solved in the present invention is to provide a kind of multi-modal emotion identification system and method, merges Text region
Technology, speech recognition technology and visual pattern technology of identification, while human emotion's identification is carried out from multiple passages, so that emotion
Identification machine can precisely recognize the emotion of destination object in interactive process.
For solving above-mentioned technical problem, the technical scheme that the present invention is provided is:
On the one hand, the present invention provides a kind of multi-modal emotion identification system, including:Voice receiver, the first emotion are recognized
Subsystem, the second emotion recognize subsystem, and visual pattern receptor, the 3rd emotion recognize subsystem, emotion follower;Voice connects
Device is received, for receiving the voice signal that destination object sends;Visual pattern receptor, for receiving the vision with regard to destination object
View data;First emotion recognizes subsystem, for obtaining the first emotion identification result according to voice signal;Second emotion is recognized
Subsystem, for obtaining the second emotion identification result according to voice signal;3rd emotion recognizes subsystem, for according to vision figure
As the 3rd emotion identification result of data acquisition;Emotion follower, for according to the first emotion identification result, the second emotion identification knot
Fruit and the 3rd emotion identification result determine the affective state of destination object.
Further, the first emotion identification subsystem, specifically includes, emotion significance dispenser, the first emotion identifier;
Emotion significance dispenser, extracts acoustics prosodic features for the voice signal to voice receiver;First emotion identifier, uses
In the first emotion identification result for obtaining voice signal according to acoustics prosodic features;Second emotion recognizes subsystem, specifically includes,
Speech recognition device, sentence characteristics extraction device, the second emotion identifier;Speech recognition device, for by the voice of voice receiver
Signal is converted into word sequence;Sentence characteristics extraction device, for extracting the sentence eigenvalue in word sequence;Second emotion is distinguished
Know device, for the second emotion identification result of voice signal is obtained according to sentence eigenvalue;3rd emotion recognizes subsystem, specifically
Including, recognition of face tracker, human bioequivalence tracker, facial expression feature extractor, body action feature extractor, the 3rd
Emotion identifier;Recognition of face tracker, for recognizing and tracking the human face data in visual image data;Human bioequivalence is tracked
Device, for recognize and track in visual image data including the whole somatic data including head;Facial expression feature is extracted
Device, for extracting the face's key point in human face data, and obtains facial expression feature value according to face's key point;Body action
Feature extractor, for extracting the body action key point in somatic data, and it is dynamic to obtain body according to body action key point
Make eigenvalue;3rd emotion identifier, for obtaining visual pattern number according to facial expression feature value and body action eigenvalue
According to the 3rd emotion recognition result;Emotion follower, for according to the first emotion identification result, the second emotion identification result and
Three emotion identification results, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object.
Further, the first emotion identifier, according to the first emotion identification knot that acoustics prosodic features obtain voice signal
Really, specifically include, the first emotion identifier, acoustics prosodic features are substituted in the class brain machine learning model for building in advance, with
Class nerve phonetic feature is obtained, and class nerve phonetic feature is substituted into the emotion model for prestoring, to obtain voice signal
The first emotion recognition confidence level corresponding to first emotion and the first emotion.
Further, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system
Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum
Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently
Spectrum envelope.
Further, sentence characteristics extraction device, extracts the sentence eigenvalue in word sequence, specifically includes, to word
Sequence carries out word segmentation processing to obtain participle eigenvalue, carries out word category analysiss to obtain word category feature to word sequence
Value, and carry out sentence pattern syntactic analysis to obtain sentence pattern syntactic feature value to word sequence;Second emotion identifier, according to sentence
Eigenvalue obtains the second emotion identification result of voice signal, specifically includes, the second emotion identifier, by sentence eigenvalue
Participle eigenvalue, word features of classification and sentence pattern syntactic feature value, in the text emotion identification model that input is built in advance, with
Obtain the second emotion recognition confidence level corresponding to second emotion and the second emotion of voice signal.
Further, the 3rd emotion identifier, obtains vision figure according to facial expression feature value and body action eigenvalue
As the 3rd emotion recognition result of data, specifically include, the 3rd emotion identifier, will be special to facial expression feature value and body action
Value indicative is substituted in the emotion classifiers for building in advance, to obtain corresponding to the 3rd emotion and the 3rd emotion of visual image data
The 3rd emotion recognition confidence level.
Further, emotion follower, distinguishes according to the first emotion identification result, the second emotion identification result and the 3rd emotion
Know result, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object, concrete bag
Include, in the first emotion recognition confidence level of the first emotion identification result, the second emotion recognition confidence of the second emotion identification result
Degree, in the 3rd emotion recognition confidence level of the 3rd emotion identification result, arbitrary emotion recognition confidence level is more than or equal to setting threshold
During value, affective state that the emotion corresponding to this emotion recognition confidence level is judged to destination object;In the first emotion identification knot
The first emotion recognition confidence level of fruit, the second emotion recognition confidence level of the second emotion identification result, the 3rd emotion identification result
The 3rd emotion recognition confidence level, respectively less than given threshold when, then according to default weight rule, to the first emotion identification result
The first emotion, the 3rd emotion of the second emotion of the second emotion identification result and the 3rd emotion identification result calculates emotion respectively
Label, to obtain the first affective tag, the second affective tag, the 3rd affective tag;According to the first affective tag, the second emotion mark
Sign, the 3rd affective tag, and according to the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the emotion of destination object
State.
On the other hand, the present invention provides a kind of multi-modal emotion discrimination method, including:Voice receiver receives destination object
The voice signal for sending;Visual pattern receptor receives the visual image data with regard to destination object;First emotion recognizes subsystem
System obtains the first emotion identification result according to voice signal;Second emotion recognizes subsystem and obtains the second emotion according to voice signal
Identification result;3rd emotion recognizes subsystem and obtains the 3rd emotion identification result according to visual image data;Emotion follower root
According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result determine the emotion shape of destination object
State.
Further, the first emotion identification subsystem obtains the first emotion identification result according to voice signal, specifically includes,
Acoustics prosodic features are extracted to the voice signal of voice receiver;According to the first emotion that acoustics prosodic features obtain voice signal
Identification result;Second emotion recognizes subsystem and obtains the second emotion identification result according to voice signal, specifically includes, voice is connect
The voice signal for receiving device is converted into word sequence;Extract the sentence eigenvalue in word sequence;Language is obtained according to sentence eigenvalue
Second emotion identification result of message number;3rd emotion recognizes subsystem and obtains the 3rd emotion identification knot according to visual image data
Really, specifically include, recognize and track the human face data in visual image data;Recognize and track including in visual image data
Head is in interior whole somatic data;The face's key point in human face data is extracted, and facial table is obtained according to face's key point
Feelings eigenvalue;The body action key point in somatic data is extracted, and body action feature is obtained according to body action key point
Value;According to the 3rd emotion recognition result that facial expression feature value and body action eigenvalue obtain visual image data;Emotion
Follower determines destination object according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result
Affective state, specifically includes, according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, with
And the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object.
Further, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system
Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum
Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently
Spectrum envelope.
The present invention provide multi-modal emotion identification system and method, merge character recognition technology, speech recognition technology and
Visual pattern technology of identification, while carrying out human emotion's identification from multiple passages, can enable emotion identification machine in people
The emotion of destination object is precisely recognized in machine interaction.
Description of the drawings
Fig. 1 is the block diagram of multi-modal emotion identification system provided in an embodiment of the present invention;
Fig. 2 is the another block diagram of multi-modal emotion identification system provided in an embodiment of the present invention;
Fig. 3 is the flow chart of multi-modal emotion discrimination method provided in an embodiment of the present invention;
Fig. 4 is the another flow chart of multi-modal emotion discrimination method provided in an embodiment of the present invention.
Specific embodiment
The present invention is further illustrated below by specific embodiment, it should be understood, however, that, these embodiments are only
It is used for specifically describing in more detail, and is not to be construed as limiting the present invention in any form.
Embodiment one
In conjunction with Fig. 1, the multi-modal emotion identification system that the present embodiment is provided, including:Voice receiver 1, the first emotion are distinguished
Know subsystem 3, the second emotion recognizes subsystem 4, and visual pattern receptor 2, the 3rd emotion recognize subsystem 5, emotion follower
6;Voice receiver 1, for receiving the voice signal that destination object sends;Visual pattern receptor 2, for receiving with regard to target
The visual image data of object;First emotion recognizes subsystem 3, for obtaining the first emotion identification result according to voice signal;
Second emotion recognizes subsystem 4, for obtaining the second emotion identification result according to voice signal;3rd emotion recognizes subsystem 5,
For obtaining the 3rd emotion identification result according to visual image data;Emotion follower 6, for according to the first emotion identification knot
Really, the second emotion identification result and the 3rd emotion identification result determine the affective state of destination object.
Preferably, as illustrated in fig. 2, the first emotion identification subsystem 3, specifically includes, emotion significance dispenser 301,
First emotion identifier 302;Emotion significance dispenser 301, extracts the acoustics rhythm for the voice signal to voice receiver 1
Feature;First emotion identifier 302, for obtaining the first emotion identification result of voice signal according to acoustics prosodic features;The
Two emotions recognize subsystem 4, specifically include, speech recognition device 401, sentence characteristics extraction device 402, the second emotion identifier
403;Speech recognition device 401, for being converted into word sequence by the voice signal of voice receiver 1;Sentence characteristics extraction device
402, for extracting the sentence eigenvalue in word sequence;Second emotion identifier 403, for obtaining language according to sentence eigenvalue
Second emotion identification result of message number;3rd emotion recognizes subsystem 5, specifically includes, recognition of face tracker 501, human body
Recognition and tracking device 503, facial expression feature extractor 502, body action feature extractor 504, the 3rd emotion identifier 505;
Recognition of face tracker 501, for recognizing and tracking the human face data in visual image data;Human bioequivalence tracker 503, uses
In recognize and track in visual image data including the whole somatic data including head;Facial expression feature extractor 502,
For extracting the face's key point in human face data, and facial expression feature value is obtained according to face's key point;Body action is special
Extractor 504 is levied, and for extracting the body action key point in somatic data, and it is dynamic body to be obtained according to body action key point
Make eigenvalue;3rd emotion identifier 505, for obtaining visual pattern according to facial expression feature value and body action eigenvalue
3rd emotion recognition result of data;Emotion follower 6, for according to the first emotion identification result, the second emotion identification result
With the 3rd emotion identification result, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, the emotion of destination object is determined
State.
Multi-modal emotion identification system provided in an embodiment of the present invention, merge character recognition technology, speech recognition technology and
Visual pattern technology of identification, while carrying out human emotion's identification from multiple passages, can enable emotion identification machine in people
The emotion of destination object is precisely recognized in machine interaction.
It should be noted that the Psychology and behavior mapping relations collection of illustrative plates that the present embodiment is referred to, is to be closed according to behavior psychology
The relation storehouse that system builds in advance, which is substantially, from mapping relations collection of illustrative plates of the behavior presentation of people to the real feelings of people.
It is further preferred that the first emotion identifier 302, according to the first emotion that acoustics prosodic features obtain voice signal
Identification result, specifically includes, the first emotion identifier 302, and acoustics prosodic features are substituted into the class brain machine learning for building in advance
In model, to obtain class nerve phonetic feature, and class nerve phonetic feature is substituted into the emotion model for prestoring, to obtain language
The first emotion recognition confidence level corresponding to first emotion and the first emotion of message number.
Specifically, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system
Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum
Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently
Spectrum envelope.
It is further preferred that sentence characteristics extraction device 402, extracts the sentence eigenvalue in word sequence, specifically includes,
Word segmentation processing is carried out to word sequence to obtain participle eigenvalue, carries out word category analysiss to obtain word class to word sequence
Other eigenvalue, and carry out sentence pattern syntactic analysis to obtain sentence pattern syntactic feature value to word sequence;Second emotion identifier
403, according to the second emotion identification result that sentence eigenvalue obtains voice signal, specifically include, the second emotion identifier 403,
The text that participle eigenvalue in sentence eigenvalue, word features of classification and sentence pattern syntactic feature value, input are built in advance
In emotion identification model, to obtain the second emotion recognition confidence corresponding to second emotion and the second emotion of voice signal
Degree.
It is further preferred that the 3rd emotion identifier 505, obtains according to facial expression feature value and body action eigenvalue
3rd emotion recognition result of visual image data, specifically includes, the 3rd emotion identifier 505, by facial expression feature value and
Body action eigenvalue is substituted in the emotion classifiers for building in advance, to obtain the 3rd emotion and the 3rd of visual image data
The 3rd emotion recognition confidence level corresponding to emotion.
In the present embodiment, facial expression is combined body action together, to carry out emotion identification to visual image data, can
To improve emotion discrimination power.For example, when people squares one's shoulders and lower jaw is kicked up, when smiling, then corresponding emotion is pride, but,
If only square one's shoulders and lower jaw is kicked up action, or when only smiling, it is impossible to judge this emotion proud.Additionally, the present embodiment is
In conjunction with the psychological study achievement of Paul Ekman, the deep learning model using countenance and body action is come the feelings to people
Sense is differentiated.
It is further preferred that emotion follower 6, according to the first emotion identification result, the second emotion identification result and the 3rd
Emotion identification result, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object, have
Body includes, in the first emotion recognition confidence level of the first emotion identification result, the second emotion recognition of the second emotion identification result
Confidence level, in the 3rd emotion recognition confidence level of the 3rd emotion identification result, arbitrary emotion recognition confidence level is more than or equal to setting
When determining threshold value, affective state that the emotion corresponding to this emotion recognition confidence level is judged to destination object;Distinguish in the first emotion
Know the first emotion recognition confidence level of result, the second emotion recognition confidence level of the second emotion identification result, the 3rd emotion are recognized
During the 3rd emotion recognition confidence level, respectively less than given threshold as a result, then according to default weight rule, the identification of the first emotion is given
3rd emotion of the first emotion as a result, the second emotion of the second emotion identification result and the 3rd emotion identification result is calculated respectively
Affective tag, to obtain the first affective tag, the second affective tag, the 3rd affective tag;According to the first affective tag, the second feelings
Sense label, the 3rd affective tag, and according to the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine destination object
Affective state.
Embodiment two
In conjunction with Fig. 3, the embodiment of the present invention provides a kind of multi-modal emotion discrimination method, including:
Step S1:Voice receiver 1 receives the voice signal that destination object sends;
Step S2:Visual pattern receptor 2 receives the visual image data with regard to destination object;
Step S3:First emotion recognizes subsystem 3 and obtains the first emotion identification result according to voice signal;
Step S4:Second emotion recognizes subsystem 4 and obtains the second emotion identification result according to voice signal;
Step S5:3rd emotion recognizes subsystem 5 and obtains the 3rd emotion identification result according to visual image data;
Step S6:Emotion follower 6 is according to the first emotion identification result, the second emotion identification result and the identification of the 3rd emotion
As a result the affective state of destination object is determined.
Preferably, as illustrated in fig. 4, the first emotion identification subsystem 3 obtains the first emotion identification knot according to voice signal
Really, specifically include,
Step S3.1:Acoustics prosodic features are extracted to the voice signal of voice receiver 1;
Step S3.2:According to the first emotion identification result that acoustics prosodic features obtain voice signal;
Second emotion recognizes subsystem 4 and obtains the second emotion identification result according to voice signal, specifically includes,
Step S4.1:The voice signal of voice receiver 1 is converted into word sequence;
Step S4.2:Extract the sentence eigenvalue in word sequence;
Step S4.3:According to the second emotion identification result that sentence eigenvalue obtains voice signal;
3rd emotion recognizes subsystem 5 and obtains the 3rd emotion identification result according to visual image data, specifically includes,
Step S5.1:Recognize and track the human face data in visual image data;
Step S5.2:Recognize and track in visual image data including the whole somatic data including head;
Step S5.3:The face's key point in human face data is extracted, and facial expression feature is obtained according to face's key point
Value;
Step S5.4:The body action key point in somatic data is extracted, and body is obtained according to body action key point
Motion characteristic value;
Step S5.5:According to the 3rd emotion that facial expression feature value and body action eigenvalue obtain visual image data
Recognition result;
According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result are true for emotion follower 6
The affective state of destination object is made, is specifically included,
Step S6.1:According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, and
The Psychology and behavior mapping relations collection of illustrative plates for building in advance, determines the affective state of destination object.
Multi-modal emotion discrimination method provided in an embodiment of the present invention, merge character recognition technology, speech recognition technology and
Visual pattern technology of identification, while carrying out human emotion's identification from multiple passages, can enable emotion identification machine in people
The emotion of destination object is precisely recognized in machine interaction.
It should be noted that the Psychology and behavior mapping relations collection of illustrative plates that the present embodiment is referred to, is to be closed according to behavior psychology
The relation storehouse that system builds in advance, which is substantially, from mapping relations collection of illustrative plates of the behavior presentation of people to the real feelings of people.
Specifically, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system
Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum
Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently
Spectrum envelope.
Although present invention has been a certain degree of description, it will be apparent that, without departing from the spirit and scope of the present invention
Under the conditions of, the appropriate change of each condition can be carried out.It is appreciated that the invention is not restricted to the embodiment, and it is attributed to right
The scope of requirement, it include the equivalent of each factor.
Claims (10)
1. a kind of multi-modal emotion identification system, it is characterised in that include:Voice receiver, the first emotion recognize subsystem, the
Two emotions recognize subsystem, and visual pattern receptor, the 3rd emotion recognize subsystem, emotion follower;
The voice receiver, for receiving the voice signal that destination object sends;
The visual pattern receptor, for receiving the visual image data with regard to the destination object;
First emotion recognizes subsystem, for obtaining the first emotion identification result according to the voice signal;
Second emotion recognizes subsystem, for obtaining the second emotion identification result according to the voice signal;
3rd emotion recognizes subsystem, for obtaining the 3rd emotion identification result according to the visual image data;
The emotion follower, for according to the first emotion identification result, the second emotion identification result and described
Three emotion identification results determine the affective state of the destination object.
2. multi-modal emotion identification system according to claim 1, it is characterised in that
First emotion recognizes subsystem, specifically includes, emotion significance dispenser, the first emotion identifier;
The emotion significance dispenser, for extracting acoustics prosodic features to the voice signal of the voice receiver;
The first emotion identifier, for obtaining first emotion of the voice signal according to the acoustics prosodic features
Identification result;
Second emotion recognizes subsystem, specifically includes, speech recognition device, sentence characteristics extraction device, and the second emotion is recognized
Device;
Institute's speech recognizer, for being converted into word sequence by the voice signal of the voice receiver;
The sentence characteristics extraction device, for extracting the sentence eigenvalue in the word sequence;
The second emotion identifier, distinguishes for obtaining second emotion of the voice signal according to the sentence eigenvalue
Know result;
3rd emotion recognizes subsystem, specifically includes, recognition of face tracker, human bioequivalence tracker, and facial expression is special
Levy extractor, body action feature extractor, the 3rd emotion identifier;
The recognition of face tracker, for recognizing and tracking the human face data in the visual image data;
The human bioequivalence tracker, for recognize and track in the visual image data including the whole people including head
Volume data;
The facial expression feature extractor, for extracting the face's key point in the human face data, and according to the face
Key point obtains facial expression feature value;
The body action feature extractor, for extracting the body action key point in the somatic data, and according to described
Body action key point obtains body action eigenvalue;
The 3rd emotion identifier, described in obtaining according to the facial expression feature value and the body action eigenvalue
3rd emotion recognition result of visual image data;
The emotion follower, for according to the first emotion identification result, the second emotion identification result and described
Three emotion identification results, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the emotion of the destination object
State.
3. multi-modal emotion identification system according to claim 2, it is characterised in that the first emotion identifier, root
According to the first emotion identification result that the acoustics prosodic features obtain the voice signal, specifically include,
The first emotion identifier, the acoustics prosodic features are substituted in the class brain machine learning model for building in advance, with
Class nerve phonetic feature is obtained, and class nerve phonetic feature is substituted into the emotion model for prestoring, the predicate to obtain
The first emotion recognition confidence level corresponding to first emotion of message number and first emotion.
4. multi-modal emotion identification system according to claim 3, it is characterised in that the acoustics prosodic features include,
Pitch, intensity, tonequality, sound spectrum, cepstrum, linear perception predict cepstrum coefficient, r.m.s. intensity, zero-crossing rate, spectrum stream, frequency spectrum matter
The heart, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum point degree, sound colourity, spectral decay point, spectrum slope,
Single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, spectrum envelope.
5. multi-modal emotion identification system according to claim 2, it is characterised in that
The sentence characteristics extraction device, extracts the sentence eigenvalue in the word sequence, specifically includes,
Word segmentation processing is carried out to the word sequence to obtain participle eigenvalue, and word category analysiss are carried out to the word sequence
To obtain word features of classification, and carry out sentence pattern syntactic analysis to obtain sentence pattern syntactic feature value to the word sequence;
The second emotion identifier, according to the second emotion identification knot that the sentence eigenvalue obtains the voice signal
Really, specifically include,
The second emotion identifier, by the participle eigenvalue in the sentence eigenvalue, the word features of classification
In the text emotion identification model built in advance with the sentence pattern syntactic feature value, input, to obtain the of the voice signal
The second emotion recognition confidence level corresponding to two emotions and second emotion.
6. multi-modal emotion identification system according to claim 2, it is characterised in that the 3rd emotion identifier, root
According to the 3rd emotion recognition knot that the facial expression feature value and the body action eigenvalue obtain the visual image data
Really, specifically include,
The facial expression feature value and the body action eigenvalue are substituted into advance structure by the 3rd emotion identifier
In emotion classifiers, to obtain the 3rd emotion corresponding to the 3rd emotion and the 3rd emotion of the visual image data
Recognition confidence.
7. multi-modal emotion identification system according to any one of claim 1 to 6, it is characterised in that the emotion is defeated
Go out device, according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, and
The Psychology and behavior mapping relations collection of illustrative plates for building in advance, determines the affective state of the destination object, specifically includes,
In the first emotion recognition confidence level of the first emotion identification result, the second emotion of the second emotion identification result
Recognition confidence, in the 3rd emotion recognition confidence level of the 3rd emotion identification result, arbitrary emotion recognition confidence level is more than
Or when being equal to given threshold, the affective state that the emotion corresponding to this emotion recognition confidence level is judged to the destination object;
In the first emotion recognition confidence level of the first emotion identification result, the second emotion of the second emotion identification result
Recognition confidence, during the 3rd emotion recognition confidence level, respectively less than given threshold of the 3rd emotion identification result, then according to pre-
If weight rule, to the first emotion of the first emotion identification result, the second emotion of the second emotion identification result
Affective tag is calculated respectively with the 3rd emotion of the 3rd emotion identification result, to obtain the first affective tag, the second emotion
Label, the 3rd affective tag;
According to first affective tag, second affective tag, the 3rd affective tag, and according to the advance structure
The Psychology and behavior mapping relations collection of illustrative plates that builds, determines the affective state of the destination object.
8. a kind of multi-modal emotion discrimination method, it is characterised in that include:
Voice receiver receives the voice signal that destination object sends;
Visual pattern receptor receives the visual image data with regard to the destination object;
First emotion recognizes subsystem and obtains the first emotion identification result according to the voice signal;
Second emotion recognizes subsystem and obtains the second emotion identification result according to the voice signal;
3rd emotion recognizes subsystem and obtains the 3rd emotion identification result according to the visual image data;
Emotion follower is according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification
As a result the affective state of the destination object is determined.
9. multi-modal emotion discrimination method according to claim 8, it is characterised in that
The first emotion identification subsystem obtains the first emotion identification result according to the voice signal, specifically includes,
Acoustics prosodic features are extracted to the voice signal of the voice receiver;
According to the first emotion identification result that the acoustics prosodic features obtain the voice signal;
The second emotion identification subsystem obtains the second emotion identification result according to the voice signal, specifically includes,
The voice signal of the voice receiver is converted into word sequence;
Extract the sentence eigenvalue in the word sequence;
According to the second emotion identification result that the sentence eigenvalue obtains the voice signal;
The 3rd emotion identification subsystem obtains the 3rd emotion identification result according to the visual image data, specifically includes,
Recognize and track the human face data in the visual image data;
Recognize and track in the visual image data including the whole somatic data including head;
The face's key point in the human face data is extracted, and facial expression feature value is obtained according to face's key point;
The body action key point in the somatic data is extracted, and it is special body action to be obtained according to the body action key point
Value indicative;
According to the 3rd emotion that the facial expression feature value and the body action eigenvalue obtain the visual image data
Recognition result;
The emotion follower is according to the first emotion identification result, the second emotion identification result and the 3rd emotion
Identification result determines the affective state of the destination object, specifically includes,
According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, and
The Psychology and behavior mapping relations collection of illustrative plates for building in advance, determines the affective state of the destination object.
10. multi-modal emotion discrimination method according to claim 8 or claim 9, it is characterised in that the acoustics prosodic features bag
Include, pitch, intensity, tonequality, sound spectrum, cepstrum, linear perception predicts cepstrum coefficient, r.m.s. intensity, zero-crossing rate, spectrum stream, frequently
Spectrum barycenter, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum point degree, sound colourity, spectral decay point, frequency spectrum
Slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, spectrum envelope.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610912302.5A CN106503646B (en) | 2016-10-19 | 2016-10-19 | Multi-mode emotion recognition system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610912302.5A CN106503646B (en) | 2016-10-19 | 2016-10-19 | Multi-mode emotion recognition system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106503646A true CN106503646A (en) | 2017-03-15 |
CN106503646B CN106503646B (en) | 2020-07-10 |
Family
ID=58294258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610912302.5A Active CN106503646B (en) | 2016-10-19 | 2016-10-19 | Multi-mode emotion recognition system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503646B (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092895A (en) * | 2017-05-09 | 2017-08-25 | 重庆邮电大学 | A kind of multi-modal emotion identification method based on depth belief network |
CN107180236A (en) * | 2017-06-02 | 2017-09-19 | 北京工业大学 | A kind of multi-modal emotion identification method based on class brain model |
CN107194151A (en) * | 2017-04-20 | 2017-09-22 | 华为技术有限公司 | Determine the method and artificial intelligence equipment of emotion threshold value |
CN107943299A (en) * | 2017-12-07 | 2018-04-20 | 上海智臻智能网络科技股份有限公司 | Emotion rendering method and device, computer equipment and computer-readable recording medium |
CN108091323A (en) * | 2017-12-19 | 2018-05-29 | 想象科技(北京)有限公司 | For identifying the method and apparatus of emotion from voice |
CN108108849A (en) * | 2017-12-31 | 2018-06-01 | 厦门大学 | A kind of microblog emotional Forecasting Methodology based on Weakly supervised multi-modal deep learning |
CN108664932A (en) * | 2017-05-12 | 2018-10-16 | 华中师范大学 | A kind of Latent abilities state identification method based on Multi-source Information Fusion |
CN108899050A (en) * | 2018-06-14 | 2018-11-27 | 南京云思创智信息科技有限公司 | Speech signal analysis subsystem based on multi-modal Emotion identification system |
CN109241912A (en) * | 2018-09-08 | 2019-01-18 | 河南大学 | The target identification method based on class brain across media intelligent towards unmanned autonomous system |
CN109254669A (en) * | 2017-07-12 | 2019-01-22 | 腾讯科技(深圳)有限公司 | A kind of expression picture input method, device, electronic equipment and system |
CN109829363A (en) * | 2018-12-18 | 2019-05-31 | 深圳壹账通智能科技有限公司 | Expression recognition method, device, computer equipment and storage medium |
CN109903392A (en) * | 2017-12-11 | 2019-06-18 | 北京京东尚科信息技术有限公司 | Augmented reality method and apparatus |
CN110033029A (en) * | 2019-03-22 | 2019-07-19 | 五邑大学 | A kind of emotion identification method and device based on multi-modal emotion model |
CN110287912A (en) * | 2019-06-28 | 2019-09-27 | 广东工业大学 | Method, apparatus and medium are determined based on the target object affective state of deep learning |
CN110688499A (en) * | 2019-08-13 | 2020-01-14 | 深圳壹账通智能科技有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110910903A (en) * | 2019-12-04 | 2020-03-24 | 深圳前海微众银行股份有限公司 | Speech emotion recognition method, device, equipment and computer readable storage medium |
EP3627304A1 (en) * | 2018-09-20 | 2020-03-25 | XRSpace CO., LTD. | Interactive responding method and computer system using the same |
US20210192332A1 (en) * | 2019-12-19 | 2021-06-24 | Sling Media Pvt Ltd | Method and system for analyzing customer calls by implementing a machine learning model to identify emotions |
CN110085211B (en) * | 2018-01-26 | 2021-06-29 | 上海智臻智能网络科技股份有限公司 | Voice recognition interaction method and device, computer equipment and storage medium |
CN113128284A (en) * | 2019-12-31 | 2021-07-16 | 上海汽车集团股份有限公司 | Multi-mode emotion recognition method and device |
US11455472B2 (en) | 2017-12-07 | 2022-09-27 | Shanghai Xiaoi Robot Technology Co., Ltd. | Method, device and computer readable storage medium for presenting emotion |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007092795A2 (en) * | 2006-02-02 | 2007-08-16 | Neuric Technologies, Llc | Method for movie animation |
CN102298694A (en) * | 2011-06-21 | 2011-12-28 | 广东爱科数字科技有限公司 | Man-machine interaction identification system applied to remote information service |
US20120308971A1 (en) * | 2011-05-31 | 2012-12-06 | Hyun Soon Shin | Emotion recognition-based bodyguard system, emotion recognition device, image and sensor control apparatus, personal protection management apparatus, and control methods thereof |
CN102881284A (en) * | 2012-09-03 | 2013-01-16 | 江苏大学 | Unspecific human voice and emotion recognition method and system |
CN103123619A (en) * | 2012-12-04 | 2013-05-29 | 江苏大学 | Visual speech multi-mode collaborative analysis method based on emotion context and system |
CN103456314A (en) * | 2013-09-03 | 2013-12-18 | 广州创维平面显示科技有限公司 | Emotion recognition method and device |
US9031293B2 (en) * | 2012-10-19 | 2015-05-12 | Sony Computer Entertainment Inc. | Multi-modal sensor based emotion recognition and emotional interface |
CN105334743A (en) * | 2015-11-18 | 2016-02-17 | 深圳创维-Rgb电子有限公司 | Intelligent home control method and system based on emotion recognition |
CN105739688A (en) * | 2016-01-21 | 2016-07-06 | 北京光年无限科技有限公司 | Man-machine interaction method and device based on emotion system, and man-machine interaction system |
CN105869657A (en) * | 2016-06-03 | 2016-08-17 | 竹间智能科技(上海)有限公司 | System and method for identifying voice emotion |
CN105975594A (en) * | 2016-05-09 | 2016-09-28 | 清华大学 | Sentiment classification method and device based on combined feature vector and SVM[perf] (Support Vector Machine) |
CN105976809A (en) * | 2016-05-25 | 2016-09-28 | 中国地质大学(武汉) | Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion |
-
2016
- 2016-10-19 CN CN201610912302.5A patent/CN106503646B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007092795A2 (en) * | 2006-02-02 | 2007-08-16 | Neuric Technologies, Llc | Method for movie animation |
US20120308971A1 (en) * | 2011-05-31 | 2012-12-06 | Hyun Soon Shin | Emotion recognition-based bodyguard system, emotion recognition device, image and sensor control apparatus, personal protection management apparatus, and control methods thereof |
CN102298694A (en) * | 2011-06-21 | 2011-12-28 | 广东爱科数字科技有限公司 | Man-machine interaction identification system applied to remote information service |
CN102881284A (en) * | 2012-09-03 | 2013-01-16 | 江苏大学 | Unspecific human voice and emotion recognition method and system |
US9031293B2 (en) * | 2012-10-19 | 2015-05-12 | Sony Computer Entertainment Inc. | Multi-modal sensor based emotion recognition and emotional interface |
CN103123619A (en) * | 2012-12-04 | 2013-05-29 | 江苏大学 | Visual speech multi-mode collaborative analysis method based on emotion context and system |
CN103456314A (en) * | 2013-09-03 | 2013-12-18 | 广州创维平面显示科技有限公司 | Emotion recognition method and device |
CN105334743A (en) * | 2015-11-18 | 2016-02-17 | 深圳创维-Rgb电子有限公司 | Intelligent home control method and system based on emotion recognition |
CN105739688A (en) * | 2016-01-21 | 2016-07-06 | 北京光年无限科技有限公司 | Man-machine interaction method and device based on emotion system, and man-machine interaction system |
CN105975594A (en) * | 2016-05-09 | 2016-09-28 | 清华大学 | Sentiment classification method and device based on combined feature vector and SVM[perf] (Support Vector Machine) |
CN105976809A (en) * | 2016-05-25 | 2016-09-28 | 中国地质大学(武汉) | Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion |
CN105869657A (en) * | 2016-06-03 | 2016-08-17 | 竹间智能科技(上海)有限公司 | System and method for identifying voice emotion |
Non-Patent Citations (3)
Title |
---|
TANJA BANZIGER 等: "Emotion Recognition From Expressions in Face, Voice, and Body:The Multimodal Emotion Recognition Test(MERT)", 《EMOTION》 * |
吴启迪: "《自然计算导论》", 31 January 2011, 《上海科学技术出版社》 * |
王蓓 等: "基于表情和语音的多模态情感识别研究", 《信息化研究》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194151A (en) * | 2017-04-20 | 2017-09-22 | 华为技术有限公司 | Determine the method and artificial intelligence equipment of emotion threshold value |
WO2018192567A1 (en) * | 2017-04-20 | 2018-10-25 | 华为技术有限公司 | Method for determining emotional threshold and artificial intelligence device |
CN107092895A (en) * | 2017-05-09 | 2017-08-25 | 重庆邮电大学 | A kind of multi-modal emotion identification method based on depth belief network |
CN108664932B (en) * | 2017-05-12 | 2021-07-09 | 华中师范大学 | Learning emotional state identification method based on multi-source information fusion |
CN108664932A (en) * | 2017-05-12 | 2018-10-16 | 华中师范大学 | A kind of Latent abilities state identification method based on Multi-source Information Fusion |
CN107180236A (en) * | 2017-06-02 | 2017-09-19 | 北京工业大学 | A kind of multi-modal emotion identification method based on class brain model |
CN107180236B (en) * | 2017-06-02 | 2020-02-11 | 北京工业大学 | Multi-modal emotion recognition method based on brain-like model |
CN109254669A (en) * | 2017-07-12 | 2019-01-22 | 腾讯科技(深圳)有限公司 | A kind of expression picture input method, device, electronic equipment and system |
CN109254669B (en) * | 2017-07-12 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Expression picture input method and device, electronic equipment and system |
CN107943299B (en) * | 2017-12-07 | 2022-05-06 | 上海智臻智能网络科技股份有限公司 | Emotion presenting method and device, computer equipment and computer readable storage medium |
CN107943299A (en) * | 2017-12-07 | 2018-04-20 | 上海智臻智能网络科技股份有限公司 | Emotion rendering method and device, computer equipment and computer-readable recording medium |
US11455472B2 (en) | 2017-12-07 | 2022-09-27 | Shanghai Xiaoi Robot Technology Co., Ltd. | Method, device and computer readable storage medium for presenting emotion |
CN109903392A (en) * | 2017-12-11 | 2019-06-18 | 北京京东尚科信息技术有限公司 | Augmented reality method and apparatus |
US11257293B2 (en) | 2017-12-11 | 2022-02-22 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Augmented reality method and device fusing image-based target state data and sound-based target state data |
CN109903392B (en) * | 2017-12-11 | 2021-12-31 | 北京京东尚科信息技术有限公司 | Augmented reality method and apparatus |
CN108091323B (en) * | 2017-12-19 | 2020-10-13 | 想象科技(北京)有限公司 | Method and apparatus for emotion recognition from speech |
CN108091323A (en) * | 2017-12-19 | 2018-05-29 | 想象科技(北京)有限公司 | For identifying the method and apparatus of emotion from voice |
CN108108849A (en) * | 2017-12-31 | 2018-06-01 | 厦门大学 | A kind of microblog emotional Forecasting Methodology based on Weakly supervised multi-modal deep learning |
CN110085211B (en) * | 2018-01-26 | 2021-06-29 | 上海智臻智能网络科技股份有限公司 | Voice recognition interaction method and device, computer equipment and storage medium |
CN108899050A (en) * | 2018-06-14 | 2018-11-27 | 南京云思创智信息科技有限公司 | Speech signal analysis subsystem based on multi-modal Emotion identification system |
CN109241912B (en) * | 2018-09-08 | 2020-08-07 | 河南大学 | Target identification method based on brain-like cross-media intelligence and oriented to unmanned autonomous system |
CN109241912A (en) * | 2018-09-08 | 2019-01-18 | 河南大学 | The target identification method based on class brain across media intelligent towards unmanned autonomous system |
EP3627304A1 (en) * | 2018-09-20 | 2020-03-25 | XRSpace CO., LTD. | Interactive responding method and computer system using the same |
CN109829363A (en) * | 2018-12-18 | 2019-05-31 | 深圳壹账通智能科技有限公司 | Expression recognition method, device, computer equipment and storage medium |
CN110033029A (en) * | 2019-03-22 | 2019-07-19 | 五邑大学 | A kind of emotion identification method and device based on multi-modal emotion model |
CN110287912A (en) * | 2019-06-28 | 2019-09-27 | 广东工业大学 | Method, apparatus and medium are determined based on the target object affective state of deep learning |
CN110688499A (en) * | 2019-08-13 | 2020-01-14 | 深圳壹账通智能科技有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110910903A (en) * | 2019-12-04 | 2020-03-24 | 深圳前海微众银行股份有限公司 | Speech emotion recognition method, device, equipment and computer readable storage medium |
CN110910903B (en) * | 2019-12-04 | 2023-03-21 | 深圳前海微众银行股份有限公司 | Speech emotion recognition method, device, equipment and computer readable storage medium |
US20210192332A1 (en) * | 2019-12-19 | 2021-06-24 | Sling Media Pvt Ltd | Method and system for analyzing customer calls by implementing a machine learning model to identify emotions |
US11630999B2 (en) * | 2019-12-19 | 2023-04-18 | Dish Network Technologies India Private Limited | Method and system for analyzing customer calls by implementing a machine learning model to identify emotions |
CN113128284A (en) * | 2019-12-31 | 2021-07-16 | 上海汽车集团股份有限公司 | Multi-mode emotion recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106503646B (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106503646A (en) | Multi-modal emotion identification system and method | |
Harwath et al. | Learning word-like units from joint audio-visual analysis | |
US20180018974A1 (en) | System and method for detecting tantrums | |
CN105632501B (en) | A kind of automatic accent classification method and device based on depth learning technology | |
EP3198589B1 (en) | Method and apparatus to synthesize voice based on facial structures | |
CN106205633B (en) | It is a kind of to imitate, perform practice scoring system | |
CN106297826A (en) | Speech emotional identification system and method | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
Fulmare et al. | Understanding and estimation of emotional expression using acoustic analysis of natural speech | |
CN108446278B (en) | A kind of semantic understanding system and method based on natural language | |
KR102607373B1 (en) | Apparatus and method for recognizing emotion in speech | |
CN106960181A (en) | A kind of pedestrian's attribute recognition approach based on RGBD data | |
KR20200105446A (en) | Apparatus and method for recognizing emotions | |
CN106489148A (en) | A kind of intention scene recognition method that is drawn a portrait based on user and system | |
CN107886968A (en) | Speech evaluating method and system | |
CN112418172A (en) | Multimode information fusion emotion analysis method based on multimode information intelligent processing unit | |
Jazouli et al. | Automatic detection of stereotyped movements in autistic children using the Kinect sensor | |
CN118098587A (en) | AI suicide risk analysis method and system based on digital doctor | |
JP5180116B2 (en) | Nationality determination device, method and program | |
CN107452370A (en) | A kind of application method of the judgment means of Chinese vowel followed by a nasal consonant dysphonia patient | |
CN108305629B (en) | Scene learning content acquisition method and device, learning equipment and storage medium | |
Amin et al. | HMM based automatic Arabic sign language translator using Kinect | |
CN112489787A (en) | Method for detecting human health based on micro-expression | |
US10971148B2 (en) | Information providing device, information providing method, and recording medium for presenting words extracted from different word groups | |
Li et al. | Interpreting sign components from accelerometer and sEMG data for automatic sign language recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |