CN106503646A - Multi-modal emotion identification system and method - Google Patents

Multi-modal emotion identification system and method Download PDF

Info

Publication number
CN106503646A
CN106503646A CN201610912302.5A CN201610912302A CN106503646A CN 106503646 A CN106503646 A CN 106503646A CN 201610912302 A CN201610912302 A CN 201610912302A CN 106503646 A CN106503646 A CN 106503646A
Authority
CN
China
Prior art keywords
emotion
identification result
emotion identification
voice signal
subsystem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610912302.5A
Other languages
Chinese (zh)
Other versions
CN106503646B (en
Inventor
简仁贤
杨闵淳
林志豪
孙廷伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intelligent Technology (shanghai) Co Ltd
Original Assignee
Intelligent Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intelligent Technology (shanghai) Co Ltd filed Critical Intelligent Technology (shanghai) Co Ltd
Priority to CN201610912302.5A priority Critical patent/CN106503646B/en
Publication of CN106503646A publication Critical patent/CN106503646A/en
Application granted granted Critical
Publication of CN106503646B publication Critical patent/CN106503646B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention provides a kind of multi-modal emotion identification system and method, and wherein, system includes, voice receiver, and the first emotion recognizes subsystem, and the second emotion recognizes subsystem, and visual pattern receptor, the 3rd emotion recognize subsystem, emotion follower;Voice receiver, for receiving the voice signal that destination object sends;Visual pattern receptor, for receiving the visual image data with regard to destination object;First emotion recognizes subsystem, for obtaining the first emotion identification result according to voice signal;Second emotion recognizes subsystem, for obtaining the second emotion identification result according to voice signal;3rd emotion recognizes subsystem, for obtaining the 3rd emotion identification result according to visual image data;Emotion follower, for according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result determine the affective state of destination object.

Description

Multi-modal emotion identification system and method
Technical field
The present invention relates to computer processing technology, more particularly to a kind of multi-modal emotion identification system and method.
Background technology
At present, emotion identification machine is generally by adopting character recognition technology, speech recognition technology or visual pattern identification A certain technology in technology is recognized to the emotion of the mankind, and this emotion identification mode is single, carries out emotion identification when institute Using quantity of information few, it is difficult to realize the human emotion's identification under complicated situation.
Content of the invention
The technical problem to be solved in the present invention is to provide a kind of multi-modal emotion identification system and method, merges Text region Technology, speech recognition technology and visual pattern technology of identification, while human emotion's identification is carried out from multiple passages, so that emotion Identification machine can precisely recognize the emotion of destination object in interactive process.
For solving above-mentioned technical problem, the technical scheme that the present invention is provided is:
On the one hand, the present invention provides a kind of multi-modal emotion identification system, including:Voice receiver, the first emotion are recognized Subsystem, the second emotion recognize subsystem, and visual pattern receptor, the 3rd emotion recognize subsystem, emotion follower;Voice connects Device is received, for receiving the voice signal that destination object sends;Visual pattern receptor, for receiving the vision with regard to destination object View data;First emotion recognizes subsystem, for obtaining the first emotion identification result according to voice signal;Second emotion is recognized Subsystem, for obtaining the second emotion identification result according to voice signal;3rd emotion recognizes subsystem, for according to vision figure As the 3rd emotion identification result of data acquisition;Emotion follower, for according to the first emotion identification result, the second emotion identification knot Fruit and the 3rd emotion identification result determine the affective state of destination object.
Further, the first emotion identification subsystem, specifically includes, emotion significance dispenser, the first emotion identifier; Emotion significance dispenser, extracts acoustics prosodic features for the voice signal to voice receiver;First emotion identifier, uses In the first emotion identification result for obtaining voice signal according to acoustics prosodic features;Second emotion recognizes subsystem, specifically includes, Speech recognition device, sentence characteristics extraction device, the second emotion identifier;Speech recognition device, for by the voice of voice receiver Signal is converted into word sequence;Sentence characteristics extraction device, for extracting the sentence eigenvalue in word sequence;Second emotion is distinguished Know device, for the second emotion identification result of voice signal is obtained according to sentence eigenvalue;3rd emotion recognizes subsystem, specifically Including, recognition of face tracker, human bioequivalence tracker, facial expression feature extractor, body action feature extractor, the 3rd Emotion identifier;Recognition of face tracker, for recognizing and tracking the human face data in visual image data;Human bioequivalence is tracked Device, for recognize and track in visual image data including the whole somatic data including head;Facial expression feature is extracted Device, for extracting the face's key point in human face data, and obtains facial expression feature value according to face's key point;Body action Feature extractor, for extracting the body action key point in somatic data, and it is dynamic to obtain body according to body action key point Make eigenvalue;3rd emotion identifier, for obtaining visual pattern number according to facial expression feature value and body action eigenvalue According to the 3rd emotion recognition result;Emotion follower, for according to the first emotion identification result, the second emotion identification result and Three emotion identification results, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object.
Further, the first emotion identifier, according to the first emotion identification knot that acoustics prosodic features obtain voice signal Really, specifically include, the first emotion identifier, acoustics prosodic features are substituted in the class brain machine learning model for building in advance, with Class nerve phonetic feature is obtained, and class nerve phonetic feature is substituted into the emotion model for prestoring, to obtain voice signal The first emotion recognition confidence level corresponding to first emotion and the first emotion.
Further, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently Spectrum envelope.
Further, sentence characteristics extraction device, extracts the sentence eigenvalue in word sequence, specifically includes, to word Sequence carries out word segmentation processing to obtain participle eigenvalue, carries out word category analysiss to obtain word category feature to word sequence Value, and carry out sentence pattern syntactic analysis to obtain sentence pattern syntactic feature value to word sequence;Second emotion identifier, according to sentence Eigenvalue obtains the second emotion identification result of voice signal, specifically includes, the second emotion identifier, by sentence eigenvalue Participle eigenvalue, word features of classification and sentence pattern syntactic feature value, in the text emotion identification model that input is built in advance, with Obtain the second emotion recognition confidence level corresponding to second emotion and the second emotion of voice signal.
Further, the 3rd emotion identifier, obtains vision figure according to facial expression feature value and body action eigenvalue As the 3rd emotion recognition result of data, specifically include, the 3rd emotion identifier, will be special to facial expression feature value and body action Value indicative is substituted in the emotion classifiers for building in advance, to obtain corresponding to the 3rd emotion and the 3rd emotion of visual image data The 3rd emotion recognition confidence level.
Further, emotion follower, distinguishes according to the first emotion identification result, the second emotion identification result and the 3rd emotion Know result, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object, concrete bag Include, in the first emotion recognition confidence level of the first emotion identification result, the second emotion recognition confidence of the second emotion identification result Degree, in the 3rd emotion recognition confidence level of the 3rd emotion identification result, arbitrary emotion recognition confidence level is more than or equal to setting threshold During value, affective state that the emotion corresponding to this emotion recognition confidence level is judged to destination object;In the first emotion identification knot The first emotion recognition confidence level of fruit, the second emotion recognition confidence level of the second emotion identification result, the 3rd emotion identification result The 3rd emotion recognition confidence level, respectively less than given threshold when, then according to default weight rule, to the first emotion identification result The first emotion, the 3rd emotion of the second emotion of the second emotion identification result and the 3rd emotion identification result calculates emotion respectively Label, to obtain the first affective tag, the second affective tag, the 3rd affective tag;According to the first affective tag, the second emotion mark Sign, the 3rd affective tag, and according to the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the emotion of destination object State.
On the other hand, the present invention provides a kind of multi-modal emotion discrimination method, including:Voice receiver receives destination object The voice signal for sending;Visual pattern receptor receives the visual image data with regard to destination object;First emotion recognizes subsystem System obtains the first emotion identification result according to voice signal;Second emotion recognizes subsystem and obtains the second emotion according to voice signal Identification result;3rd emotion recognizes subsystem and obtains the 3rd emotion identification result according to visual image data;Emotion follower root According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result determine the emotion shape of destination object State.
Further, the first emotion identification subsystem obtains the first emotion identification result according to voice signal, specifically includes, Acoustics prosodic features are extracted to the voice signal of voice receiver;According to the first emotion that acoustics prosodic features obtain voice signal Identification result;Second emotion recognizes subsystem and obtains the second emotion identification result according to voice signal, specifically includes, voice is connect The voice signal for receiving device is converted into word sequence;Extract the sentence eigenvalue in word sequence;Language is obtained according to sentence eigenvalue Second emotion identification result of message number;3rd emotion recognizes subsystem and obtains the 3rd emotion identification knot according to visual image data Really, specifically include, recognize and track the human face data in visual image data;Recognize and track including in visual image data Head is in interior whole somatic data;The face's key point in human face data is extracted, and facial table is obtained according to face's key point Feelings eigenvalue;The body action key point in somatic data is extracted, and body action feature is obtained according to body action key point Value;According to the 3rd emotion recognition result that facial expression feature value and body action eigenvalue obtain visual image data;Emotion Follower determines destination object according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result Affective state, specifically includes, according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, with And the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object.
Further, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently Spectrum envelope.
The present invention provide multi-modal emotion identification system and method, merge character recognition technology, speech recognition technology and Visual pattern technology of identification, while carrying out human emotion's identification from multiple passages, can enable emotion identification machine in people The emotion of destination object is precisely recognized in machine interaction.
Description of the drawings
Fig. 1 is the block diagram of multi-modal emotion identification system provided in an embodiment of the present invention;
Fig. 2 is the another block diagram of multi-modal emotion identification system provided in an embodiment of the present invention;
Fig. 3 is the flow chart of multi-modal emotion discrimination method provided in an embodiment of the present invention;
Fig. 4 is the another flow chart of multi-modal emotion discrimination method provided in an embodiment of the present invention.
Specific embodiment
The present invention is further illustrated below by specific embodiment, it should be understood, however, that, these embodiments are only It is used for specifically describing in more detail, and is not to be construed as limiting the present invention in any form.
Embodiment one
In conjunction with Fig. 1, the multi-modal emotion identification system that the present embodiment is provided, including:Voice receiver 1, the first emotion are distinguished Know subsystem 3, the second emotion recognizes subsystem 4, and visual pattern receptor 2, the 3rd emotion recognize subsystem 5, emotion follower 6;Voice receiver 1, for receiving the voice signal that destination object sends;Visual pattern receptor 2, for receiving with regard to target The visual image data of object;First emotion recognizes subsystem 3, for obtaining the first emotion identification result according to voice signal; Second emotion recognizes subsystem 4, for obtaining the second emotion identification result according to voice signal;3rd emotion recognizes subsystem 5, For obtaining the 3rd emotion identification result according to visual image data;Emotion follower 6, for according to the first emotion identification knot Really, the second emotion identification result and the 3rd emotion identification result determine the affective state of destination object.
Preferably, as illustrated in fig. 2, the first emotion identification subsystem 3, specifically includes, emotion significance dispenser 301, First emotion identifier 302;Emotion significance dispenser 301, extracts the acoustics rhythm for the voice signal to voice receiver 1 Feature;First emotion identifier 302, for obtaining the first emotion identification result of voice signal according to acoustics prosodic features;The Two emotions recognize subsystem 4, specifically include, speech recognition device 401, sentence characteristics extraction device 402, the second emotion identifier 403;Speech recognition device 401, for being converted into word sequence by the voice signal of voice receiver 1;Sentence characteristics extraction device 402, for extracting the sentence eigenvalue in word sequence;Second emotion identifier 403, for obtaining language according to sentence eigenvalue Second emotion identification result of message number;3rd emotion recognizes subsystem 5, specifically includes, recognition of face tracker 501, human body Recognition and tracking device 503, facial expression feature extractor 502, body action feature extractor 504, the 3rd emotion identifier 505; Recognition of face tracker 501, for recognizing and tracking the human face data in visual image data;Human bioequivalence tracker 503, uses In recognize and track in visual image data including the whole somatic data including head;Facial expression feature extractor 502, For extracting the face's key point in human face data, and facial expression feature value is obtained according to face's key point;Body action is special Extractor 504 is levied, and for extracting the body action key point in somatic data, and it is dynamic body to be obtained according to body action key point Make eigenvalue;3rd emotion identifier 505, for obtaining visual pattern according to facial expression feature value and body action eigenvalue 3rd emotion recognition result of data;Emotion follower 6, for according to the first emotion identification result, the second emotion identification result With the 3rd emotion identification result, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, the emotion of destination object is determined State.
Multi-modal emotion identification system provided in an embodiment of the present invention, merge character recognition technology, speech recognition technology and Visual pattern technology of identification, while carrying out human emotion's identification from multiple passages, can enable emotion identification machine in people The emotion of destination object is precisely recognized in machine interaction.
It should be noted that the Psychology and behavior mapping relations collection of illustrative plates that the present embodiment is referred to, is to be closed according to behavior psychology The relation storehouse that system builds in advance, which is substantially, from mapping relations collection of illustrative plates of the behavior presentation of people to the real feelings of people.
It is further preferred that the first emotion identifier 302, according to the first emotion that acoustics prosodic features obtain voice signal Identification result, specifically includes, the first emotion identifier 302, and acoustics prosodic features are substituted into the class brain machine learning for building in advance In model, to obtain class nerve phonetic feature, and class nerve phonetic feature is substituted into the emotion model for prestoring, to obtain language The first emotion recognition confidence level corresponding to first emotion and the first emotion of message number.
Specifically, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently Spectrum envelope.
It is further preferred that sentence characteristics extraction device 402, extracts the sentence eigenvalue in word sequence, specifically includes, Word segmentation processing is carried out to word sequence to obtain participle eigenvalue, carries out word category analysiss to obtain word class to word sequence Other eigenvalue, and carry out sentence pattern syntactic analysis to obtain sentence pattern syntactic feature value to word sequence;Second emotion identifier 403, according to the second emotion identification result that sentence eigenvalue obtains voice signal, specifically include, the second emotion identifier 403, The text that participle eigenvalue in sentence eigenvalue, word features of classification and sentence pattern syntactic feature value, input are built in advance In emotion identification model, to obtain the second emotion recognition confidence corresponding to second emotion and the second emotion of voice signal Degree.
It is further preferred that the 3rd emotion identifier 505, obtains according to facial expression feature value and body action eigenvalue 3rd emotion recognition result of visual image data, specifically includes, the 3rd emotion identifier 505, by facial expression feature value and Body action eigenvalue is substituted in the emotion classifiers for building in advance, to obtain the 3rd emotion and the 3rd of visual image data The 3rd emotion recognition confidence level corresponding to emotion.
In the present embodiment, facial expression is combined body action together, to carry out emotion identification to visual image data, can To improve emotion discrimination power.For example, when people squares one's shoulders and lower jaw is kicked up, when smiling, then corresponding emotion is pride, but, If only square one's shoulders and lower jaw is kicked up action, or when only smiling, it is impossible to judge this emotion proud.Additionally, the present embodiment is In conjunction with the psychological study achievement of Paul Ekman, the deep learning model using countenance and body action is come the feelings to people Sense is differentiated.
It is further preferred that emotion follower 6, according to the first emotion identification result, the second emotion identification result and the 3rd Emotion identification result, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the affective state of destination object, have Body includes, in the first emotion recognition confidence level of the first emotion identification result, the second emotion recognition of the second emotion identification result Confidence level, in the 3rd emotion recognition confidence level of the 3rd emotion identification result, arbitrary emotion recognition confidence level is more than or equal to setting When determining threshold value, affective state that the emotion corresponding to this emotion recognition confidence level is judged to destination object;Distinguish in the first emotion Know the first emotion recognition confidence level of result, the second emotion recognition confidence level of the second emotion identification result, the 3rd emotion are recognized During the 3rd emotion recognition confidence level, respectively less than given threshold as a result, then according to default weight rule, the identification of the first emotion is given 3rd emotion of the first emotion as a result, the second emotion of the second emotion identification result and the 3rd emotion identification result is calculated respectively Affective tag, to obtain the first affective tag, the second affective tag, the 3rd affective tag;According to the first affective tag, the second feelings Sense label, the 3rd affective tag, and according to the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine destination object Affective state.
Embodiment two
In conjunction with Fig. 3, the embodiment of the present invention provides a kind of multi-modal emotion discrimination method, including:
Step S1:Voice receiver 1 receives the voice signal that destination object sends;
Step S2:Visual pattern receptor 2 receives the visual image data with regard to destination object;
Step S3:First emotion recognizes subsystem 3 and obtains the first emotion identification result according to voice signal;
Step S4:Second emotion recognizes subsystem 4 and obtains the second emotion identification result according to voice signal;
Step S5:3rd emotion recognizes subsystem 5 and obtains the 3rd emotion identification result according to visual image data;
Step S6:Emotion follower 6 is according to the first emotion identification result, the second emotion identification result and the identification of the 3rd emotion As a result the affective state of destination object is determined.
Preferably, as illustrated in fig. 4, the first emotion identification subsystem 3 obtains the first emotion identification knot according to voice signal Really, specifically include,
Step S3.1:Acoustics prosodic features are extracted to the voice signal of voice receiver 1;
Step S3.2:According to the first emotion identification result that acoustics prosodic features obtain voice signal;
Second emotion recognizes subsystem 4 and obtains the second emotion identification result according to voice signal, specifically includes,
Step S4.1:The voice signal of voice receiver 1 is converted into word sequence;
Step S4.2:Extract the sentence eigenvalue in word sequence;
Step S4.3:According to the second emotion identification result that sentence eigenvalue obtains voice signal;
3rd emotion recognizes subsystem 5 and obtains the 3rd emotion identification result according to visual image data, specifically includes,
Step S5.1:Recognize and track the human face data in visual image data;
Step S5.2:Recognize and track in visual image data including the whole somatic data including head;
Step S5.3:The face's key point in human face data is extracted, and facial expression feature is obtained according to face's key point Value;
Step S5.4:The body action key point in somatic data is extracted, and body is obtained according to body action key point Motion characteristic value;
Step S5.5:According to the 3rd emotion that facial expression feature value and body action eigenvalue obtain visual image data Recognition result;
According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result are true for emotion follower 6 The affective state of destination object is made, is specifically included,
Step S6.1:According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, and The Psychology and behavior mapping relations collection of illustrative plates for building in advance, determines the affective state of destination object.
Multi-modal emotion discrimination method provided in an embodiment of the present invention, merge character recognition technology, speech recognition technology and Visual pattern technology of identification, while carrying out human emotion's identification from multiple passages, can enable emotion identification machine in people The emotion of destination object is precisely recognized in machine interaction.
It should be noted that the Psychology and behavior mapping relations collection of illustrative plates that the present embodiment is referred to, is to be closed according to behavior psychology The relation storehouse that system builds in advance, which is substantially, from mapping relations collection of illustrative plates of the behavior presentation of people to the real feelings of people.
Specifically, acoustics prosodic features include, pitch, intensity, tonequality, sound spectrum, cepstrum, and linear perception predicts cepstrum system Number, r.m.s. intensity, zero-crossing rate, spectrum stream, spectral centroid, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum Sharp degree, sound colourity, spectral decay point, spectrum slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, frequently Spectrum envelope.
Although present invention has been a certain degree of description, it will be apparent that, without departing from the spirit and scope of the present invention Under the conditions of, the appropriate change of each condition can be carried out.It is appreciated that the invention is not restricted to the embodiment, and it is attributed to right The scope of requirement, it include the equivalent of each factor.

Claims (10)

1. a kind of multi-modal emotion identification system, it is characterised in that include:Voice receiver, the first emotion recognize subsystem, the Two emotions recognize subsystem, and visual pattern receptor, the 3rd emotion recognize subsystem, emotion follower;
The voice receiver, for receiving the voice signal that destination object sends;
The visual pattern receptor, for receiving the visual image data with regard to the destination object;
First emotion recognizes subsystem, for obtaining the first emotion identification result according to the voice signal;
Second emotion recognizes subsystem, for obtaining the second emotion identification result according to the voice signal;
3rd emotion recognizes subsystem, for obtaining the 3rd emotion identification result according to the visual image data;
The emotion follower, for according to the first emotion identification result, the second emotion identification result and described Three emotion identification results determine the affective state of the destination object.
2. multi-modal emotion identification system according to claim 1, it is characterised in that
First emotion recognizes subsystem, specifically includes, emotion significance dispenser, the first emotion identifier;
The emotion significance dispenser, for extracting acoustics prosodic features to the voice signal of the voice receiver;
The first emotion identifier, for obtaining first emotion of the voice signal according to the acoustics prosodic features Identification result;
Second emotion recognizes subsystem, specifically includes, speech recognition device, sentence characteristics extraction device, and the second emotion is recognized Device;
Institute's speech recognizer, for being converted into word sequence by the voice signal of the voice receiver;
The sentence characteristics extraction device, for extracting the sentence eigenvalue in the word sequence;
The second emotion identifier, distinguishes for obtaining second emotion of the voice signal according to the sentence eigenvalue Know result;
3rd emotion recognizes subsystem, specifically includes, recognition of face tracker, human bioequivalence tracker, and facial expression is special Levy extractor, body action feature extractor, the 3rd emotion identifier;
The recognition of face tracker, for recognizing and tracking the human face data in the visual image data;
The human bioequivalence tracker, for recognize and track in the visual image data including the whole people including head Volume data;
The facial expression feature extractor, for extracting the face's key point in the human face data, and according to the face Key point obtains facial expression feature value;
The body action feature extractor, for extracting the body action key point in the somatic data, and according to described Body action key point obtains body action eigenvalue;
The 3rd emotion identifier, described in obtaining according to the facial expression feature value and the body action eigenvalue 3rd emotion recognition result of visual image data;
The emotion follower, for according to the first emotion identification result, the second emotion identification result and described Three emotion identification results, and the Psychology and behavior mapping relations collection of illustrative plates for building in advance, determine the emotion of the destination object State.
3. multi-modal emotion identification system according to claim 2, it is characterised in that the first emotion identifier, root According to the first emotion identification result that the acoustics prosodic features obtain the voice signal, specifically include,
The first emotion identifier, the acoustics prosodic features are substituted in the class brain machine learning model for building in advance, with Class nerve phonetic feature is obtained, and class nerve phonetic feature is substituted into the emotion model for prestoring, the predicate to obtain The first emotion recognition confidence level corresponding to first emotion of message number and first emotion.
4. multi-modal emotion identification system according to claim 3, it is characterised in that the acoustics prosodic features include, Pitch, intensity, tonequality, sound spectrum, cepstrum, linear perception predict cepstrum coefficient, r.m.s. intensity, zero-crossing rate, spectrum stream, frequency spectrum matter The heart, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum point degree, sound colourity, spectral decay point, spectrum slope, Single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, spectrum envelope.
5. multi-modal emotion identification system according to claim 2, it is characterised in that
The sentence characteristics extraction device, extracts the sentence eigenvalue in the word sequence, specifically includes,
Word segmentation processing is carried out to the word sequence to obtain participle eigenvalue, and word category analysiss are carried out to the word sequence To obtain word features of classification, and carry out sentence pattern syntactic analysis to obtain sentence pattern syntactic feature value to the word sequence;
The second emotion identifier, according to the second emotion identification knot that the sentence eigenvalue obtains the voice signal Really, specifically include,
The second emotion identifier, by the participle eigenvalue in the sentence eigenvalue, the word features of classification In the text emotion identification model built in advance with the sentence pattern syntactic feature value, input, to obtain the of the voice signal The second emotion recognition confidence level corresponding to two emotions and second emotion.
6. multi-modal emotion identification system according to claim 2, it is characterised in that the 3rd emotion identifier, root According to the 3rd emotion recognition knot that the facial expression feature value and the body action eigenvalue obtain the visual image data Really, specifically include,
The facial expression feature value and the body action eigenvalue are substituted into advance structure by the 3rd emotion identifier In emotion classifiers, to obtain the 3rd emotion corresponding to the 3rd emotion and the 3rd emotion of the visual image data Recognition confidence.
7. multi-modal emotion identification system according to any one of claim 1 to 6, it is characterised in that the emotion is defeated Go out device, according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, and The Psychology and behavior mapping relations collection of illustrative plates for building in advance, determines the affective state of the destination object, specifically includes,
In the first emotion recognition confidence level of the first emotion identification result, the second emotion of the second emotion identification result Recognition confidence, in the 3rd emotion recognition confidence level of the 3rd emotion identification result, arbitrary emotion recognition confidence level is more than Or when being equal to given threshold, the affective state that the emotion corresponding to this emotion recognition confidence level is judged to the destination object;
In the first emotion recognition confidence level of the first emotion identification result, the second emotion of the second emotion identification result Recognition confidence, during the 3rd emotion recognition confidence level, respectively less than given threshold of the 3rd emotion identification result, then according to pre- If weight rule, to the first emotion of the first emotion identification result, the second emotion of the second emotion identification result Affective tag is calculated respectively with the 3rd emotion of the 3rd emotion identification result, to obtain the first affective tag, the second emotion Label, the 3rd affective tag;
According to first affective tag, second affective tag, the 3rd affective tag, and according to the advance structure The Psychology and behavior mapping relations collection of illustrative plates that builds, determines the affective state of the destination object.
8. a kind of multi-modal emotion discrimination method, it is characterised in that include:
Voice receiver receives the voice signal that destination object sends;
Visual pattern receptor receives the visual image data with regard to the destination object;
First emotion recognizes subsystem and obtains the first emotion identification result according to the voice signal;
Second emotion recognizes subsystem and obtains the second emotion identification result according to the voice signal;
3rd emotion recognizes subsystem and obtains the 3rd emotion identification result according to the visual image data;
Emotion follower is according to the first emotion identification result, the second emotion identification result and the 3rd emotion identification As a result the affective state of the destination object is determined.
9. multi-modal emotion discrimination method according to claim 8, it is characterised in that
The first emotion identification subsystem obtains the first emotion identification result according to the voice signal, specifically includes,
Acoustics prosodic features are extracted to the voice signal of the voice receiver;
According to the first emotion identification result that the acoustics prosodic features obtain the voice signal;
The second emotion identification subsystem obtains the second emotion identification result according to the voice signal, specifically includes,
The voice signal of the voice receiver is converted into word sequence;
Extract the sentence eigenvalue in the word sequence;
According to the second emotion identification result that the sentence eigenvalue obtains the voice signal;
The 3rd emotion identification subsystem obtains the 3rd emotion identification result according to the visual image data, specifically includes,
Recognize and track the human face data in the visual image data;
Recognize and track in the visual image data including the whole somatic data including head;
The face's key point in the human face data is extracted, and facial expression feature value is obtained according to face's key point;
The body action key point in the somatic data is extracted, and it is special body action to be obtained according to the body action key point Value indicative;
According to the 3rd emotion that the facial expression feature value and the body action eigenvalue obtain the visual image data Recognition result;
The emotion follower is according to the first emotion identification result, the second emotion identification result and the 3rd emotion Identification result determines the affective state of the destination object, specifically includes,
According to the first emotion identification result, the second emotion identification result and the 3rd emotion identification result, and The Psychology and behavior mapping relations collection of illustrative plates for building in advance, determines the affective state of the destination object.
10. multi-modal emotion discrimination method according to claim 8 or claim 9, it is characterised in that the acoustics prosodic features bag Include, pitch, intensity, tonequality, sound spectrum, cepstrum, linear perception predicts cepstrum coefficient, r.m.s. intensity, zero-crossing rate, spectrum stream, frequently Spectrum barycenter, bandwidth, frequency spectrum business, frequency spectrum flatness, frequency spectrum gradient, frequency spectrum point degree, sound colourity, spectral decay point, frequency spectrum Slope, single-frequency overtone, sound probability, acoustic resonance peak, voice climb a little, spectrum envelope.
CN201610912302.5A 2016-10-19 2016-10-19 Multi-mode emotion recognition system and method Active CN106503646B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610912302.5A CN106503646B (en) 2016-10-19 2016-10-19 Multi-mode emotion recognition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610912302.5A CN106503646B (en) 2016-10-19 2016-10-19 Multi-mode emotion recognition system and method

Publications (2)

Publication Number Publication Date
CN106503646A true CN106503646A (en) 2017-03-15
CN106503646B CN106503646B (en) 2020-07-10

Family

ID=58294258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610912302.5A Active CN106503646B (en) 2016-10-19 2016-10-19 Multi-mode emotion recognition system and method

Country Status (1)

Country Link
CN (1) CN106503646B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092895A (en) * 2017-05-09 2017-08-25 重庆邮电大学 A kind of multi-modal emotion identification method based on depth belief network
CN107180236A (en) * 2017-06-02 2017-09-19 北京工业大学 A kind of multi-modal emotion identification method based on class brain model
CN107194151A (en) * 2017-04-20 2017-09-22 华为技术有限公司 Determine the method and artificial intelligence equipment of emotion threshold value
CN107943299A (en) * 2017-12-07 2018-04-20 上海智臻智能网络科技股份有限公司 Emotion rendering method and device, computer equipment and computer-readable recording medium
CN108091323A (en) * 2017-12-19 2018-05-29 想象科技(北京)有限公司 For identifying the method and apparatus of emotion from voice
CN108108849A (en) * 2017-12-31 2018-06-01 厦门大学 A kind of microblog emotional Forecasting Methodology based on Weakly supervised multi-modal deep learning
CN108664932A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of Latent abilities state identification method based on Multi-source Information Fusion
CN108899050A (en) * 2018-06-14 2018-11-27 南京云思创智信息科技有限公司 Speech signal analysis subsystem based on multi-modal Emotion identification system
CN109241912A (en) * 2018-09-08 2019-01-18 河南大学 The target identification method based on class brain across media intelligent towards unmanned autonomous system
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN109829363A (en) * 2018-12-18 2019-05-31 深圳壹账通智能科技有限公司 Expression recognition method, device, computer equipment and storage medium
CN109903392A (en) * 2017-12-11 2019-06-18 北京京东尚科信息技术有限公司 Augmented reality method and apparatus
CN110033029A (en) * 2019-03-22 2019-07-19 五邑大学 A kind of emotion identification method and device based on multi-modal emotion model
CN110287912A (en) * 2019-06-28 2019-09-27 广东工业大学 Method, apparatus and medium are determined based on the target object affective state of deep learning
CN110688499A (en) * 2019-08-13 2020-01-14 深圳壹账通智能科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110910903A (en) * 2019-12-04 2020-03-24 深圳前海微众银行股份有限公司 Speech emotion recognition method, device, equipment and computer readable storage medium
EP3627304A1 (en) * 2018-09-20 2020-03-25 XRSpace CO., LTD. Interactive responding method and computer system using the same
US20210192332A1 (en) * 2019-12-19 2021-06-24 Sling Media Pvt Ltd Method and system for analyzing customer calls by implementing a machine learning model to identify emotions
CN110085211B (en) * 2018-01-26 2021-06-29 上海智臻智能网络科技股份有限公司 Voice recognition interaction method and device, computer equipment and storage medium
CN113128284A (en) * 2019-12-31 2021-07-16 上海汽车集团股份有限公司 Multi-mode emotion recognition method and device
US11455472B2 (en) 2017-12-07 2022-09-27 Shanghai Xiaoi Robot Technology Co., Ltd. Method, device and computer readable storage medium for presenting emotion

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007092795A2 (en) * 2006-02-02 2007-08-16 Neuric Technologies, Llc Method for movie animation
CN102298694A (en) * 2011-06-21 2011-12-28 广东爱科数字科技有限公司 Man-machine interaction identification system applied to remote information service
US20120308971A1 (en) * 2011-05-31 2012-12-06 Hyun Soon Shin Emotion recognition-based bodyguard system, emotion recognition device, image and sensor control apparatus, personal protection management apparatus, and control methods thereof
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system
CN103123619A (en) * 2012-12-04 2013-05-29 江苏大学 Visual speech multi-mode collaborative analysis method based on emotion context and system
CN103456314A (en) * 2013-09-03 2013-12-18 广州创维平面显示科技有限公司 Emotion recognition method and device
US9031293B2 (en) * 2012-10-19 2015-05-12 Sony Computer Entertainment Inc. Multi-modal sensor based emotion recognition and emotional interface
CN105334743A (en) * 2015-11-18 2016-02-17 深圳创维-Rgb电子有限公司 Intelligent home control method and system based on emotion recognition
CN105739688A (en) * 2016-01-21 2016-07-06 北京光年无限科技有限公司 Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN105869657A (en) * 2016-06-03 2016-08-17 竹间智能科技(上海)有限公司 System and method for identifying voice emotion
CN105975594A (en) * 2016-05-09 2016-09-28 清华大学 Sentiment classification method and device based on combined feature vector and SVM[perf] (Support Vector Machine)
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007092795A2 (en) * 2006-02-02 2007-08-16 Neuric Technologies, Llc Method for movie animation
US20120308971A1 (en) * 2011-05-31 2012-12-06 Hyun Soon Shin Emotion recognition-based bodyguard system, emotion recognition device, image and sensor control apparatus, personal protection management apparatus, and control methods thereof
CN102298694A (en) * 2011-06-21 2011-12-28 广东爱科数字科技有限公司 Man-machine interaction identification system applied to remote information service
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system
US9031293B2 (en) * 2012-10-19 2015-05-12 Sony Computer Entertainment Inc. Multi-modal sensor based emotion recognition and emotional interface
CN103123619A (en) * 2012-12-04 2013-05-29 江苏大学 Visual speech multi-mode collaborative analysis method based on emotion context and system
CN103456314A (en) * 2013-09-03 2013-12-18 广州创维平面显示科技有限公司 Emotion recognition method and device
CN105334743A (en) * 2015-11-18 2016-02-17 深圳创维-Rgb电子有限公司 Intelligent home control method and system based on emotion recognition
CN105739688A (en) * 2016-01-21 2016-07-06 北京光年无限科技有限公司 Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN105975594A (en) * 2016-05-09 2016-09-28 清华大学 Sentiment classification method and device based on combined feature vector and SVM[perf] (Support Vector Machine)
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN105869657A (en) * 2016-06-03 2016-08-17 竹间智能科技(上海)有限公司 System and method for identifying voice emotion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TANJA BANZIGER 等: "Emotion Recognition From Expressions in Face, Voice, and Body:The Multimodal Emotion Recognition Test(MERT)", 《EMOTION》 *
吴启迪: "《自然计算导论》", 31 January 2011, 《上海科学技术出版社》 *
王蓓 等: "基于表情和语音的多模态情感识别研究", 《信息化研究》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194151A (en) * 2017-04-20 2017-09-22 华为技术有限公司 Determine the method and artificial intelligence equipment of emotion threshold value
WO2018192567A1 (en) * 2017-04-20 2018-10-25 华为技术有限公司 Method for determining emotional threshold and artificial intelligence device
CN107092895A (en) * 2017-05-09 2017-08-25 重庆邮电大学 A kind of multi-modal emotion identification method based on depth belief network
CN108664932B (en) * 2017-05-12 2021-07-09 华中师范大学 Learning emotional state identification method based on multi-source information fusion
CN108664932A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of Latent abilities state identification method based on Multi-source Information Fusion
CN107180236A (en) * 2017-06-02 2017-09-19 北京工业大学 A kind of multi-modal emotion identification method based on class brain model
CN107180236B (en) * 2017-06-02 2020-02-11 北京工业大学 Multi-modal emotion recognition method based on brain-like model
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN109254669B (en) * 2017-07-12 2022-05-10 腾讯科技(深圳)有限公司 Expression picture input method and device, electronic equipment and system
CN107943299B (en) * 2017-12-07 2022-05-06 上海智臻智能网络科技股份有限公司 Emotion presenting method and device, computer equipment and computer readable storage medium
CN107943299A (en) * 2017-12-07 2018-04-20 上海智臻智能网络科技股份有限公司 Emotion rendering method and device, computer equipment and computer-readable recording medium
US11455472B2 (en) 2017-12-07 2022-09-27 Shanghai Xiaoi Robot Technology Co., Ltd. Method, device and computer readable storage medium for presenting emotion
CN109903392A (en) * 2017-12-11 2019-06-18 北京京东尚科信息技术有限公司 Augmented reality method and apparatus
US11257293B2 (en) 2017-12-11 2022-02-22 Beijing Jingdong Shangke Information Technology Co., Ltd. Augmented reality method and device fusing image-based target state data and sound-based target state data
CN109903392B (en) * 2017-12-11 2021-12-31 北京京东尚科信息技术有限公司 Augmented reality method and apparatus
CN108091323B (en) * 2017-12-19 2020-10-13 想象科技(北京)有限公司 Method and apparatus for emotion recognition from speech
CN108091323A (en) * 2017-12-19 2018-05-29 想象科技(北京)有限公司 For identifying the method and apparatus of emotion from voice
CN108108849A (en) * 2017-12-31 2018-06-01 厦门大学 A kind of microblog emotional Forecasting Methodology based on Weakly supervised multi-modal deep learning
CN110085211B (en) * 2018-01-26 2021-06-29 上海智臻智能网络科技股份有限公司 Voice recognition interaction method and device, computer equipment and storage medium
CN108899050A (en) * 2018-06-14 2018-11-27 南京云思创智信息科技有限公司 Speech signal analysis subsystem based on multi-modal Emotion identification system
CN109241912B (en) * 2018-09-08 2020-08-07 河南大学 Target identification method based on brain-like cross-media intelligence and oriented to unmanned autonomous system
CN109241912A (en) * 2018-09-08 2019-01-18 河南大学 The target identification method based on class brain across media intelligent towards unmanned autonomous system
EP3627304A1 (en) * 2018-09-20 2020-03-25 XRSpace CO., LTD. Interactive responding method and computer system using the same
CN109829363A (en) * 2018-12-18 2019-05-31 深圳壹账通智能科技有限公司 Expression recognition method, device, computer equipment and storage medium
CN110033029A (en) * 2019-03-22 2019-07-19 五邑大学 A kind of emotion identification method and device based on multi-modal emotion model
CN110287912A (en) * 2019-06-28 2019-09-27 广东工业大学 Method, apparatus and medium are determined based on the target object affective state of deep learning
CN110688499A (en) * 2019-08-13 2020-01-14 深圳壹账通智能科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110910903A (en) * 2019-12-04 2020-03-24 深圳前海微众银行股份有限公司 Speech emotion recognition method, device, equipment and computer readable storage medium
CN110910903B (en) * 2019-12-04 2023-03-21 深圳前海微众银行股份有限公司 Speech emotion recognition method, device, equipment and computer readable storage medium
US20210192332A1 (en) * 2019-12-19 2021-06-24 Sling Media Pvt Ltd Method and system for analyzing customer calls by implementing a machine learning model to identify emotions
US11630999B2 (en) * 2019-12-19 2023-04-18 Dish Network Technologies India Private Limited Method and system for analyzing customer calls by implementing a machine learning model to identify emotions
CN113128284A (en) * 2019-12-31 2021-07-16 上海汽车集团股份有限公司 Multi-mode emotion recognition method and device

Also Published As

Publication number Publication date
CN106503646B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN106503646A (en) Multi-modal emotion identification system and method
Harwath et al. Learning word-like units from joint audio-visual analysis
US20180018974A1 (en) System and method for detecting tantrums
CN105632501B (en) A kind of automatic accent classification method and device based on depth learning technology
EP3198589B1 (en) Method and apparatus to synthesize voice based on facial structures
CN106205633B (en) It is a kind of to imitate, perform practice scoring system
CN106297826A (en) Speech emotional identification system and method
Fulmare et al. Understanding and estimation of emotional expression using acoustic analysis of natural speech
CN108446278B (en) A kind of semantic understanding system and method based on natural language
CN106960181A (en) A kind of pedestrian's attribute recognition approach based on RGBD data
KR20200105446A (en) Apparatus and method for recognizing emotions
CN107886968A (en) Speech evaluating method and system
CN110490428A (en) Job of air traffic control method for evaluating quality and relevant apparatus
KR102607373B1 (en) Apparatus and method for recognizing emotion in speech
CN106489148A (en) A kind of intention scene recognition method that is drawn a portrait based on user and system
CN112418172A (en) Multimode information fusion emotion analysis method based on multimode information intelligent processing unit
Jazouli et al. Automatic detection of stereotyped movements in autistic children using the Kinect sensor
US20190103110A1 (en) Information processing device, information processing method, and program
CN107452370A (en) A kind of application method of the judgment means of Chinese vowel followed by a nasal consonant dysphonia patient
Samara et al. Sensing affective states using facial expression analysis
CN108305629B (en) Scene learning content acquisition method and device, learning equipment and storage medium
JP2010191530A (en) Nationality decision device and method, and program
Amin et al. HMM based automatic Arabic sign language translator using Kinect
CN112489787A (en) Method for detecting human health based on micro-expression
Kaur et al. Extraction of heart rate parameters using speech analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant