CN110516696A - It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression - Google Patents
It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression Download PDFInfo
- Publication number
- CN110516696A CN110516696A CN201910632006.3A CN201910632006A CN110516696A CN 110516696 A CN110516696 A CN 110516696A CN 201910632006 A CN201910632006 A CN 201910632006A CN 110516696 A CN110516696 A CN 110516696A
- Authority
- CN
- China
- Prior art keywords
- expression
- data
- voice
- feature
- human face
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Emotion identification method is merged based on the adaptive weighting bimodal of voice and human face expression the present invention relates to a kind of, the following steps are included: obtaining emotional speech and human face expression data, affection data is corresponding with emotional category, and choose training sample set test sample collection;Speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data;Speech emotional feature and expressive features are based respectively on, are learnt using the deep learning method based on semi-supervised autocoder, classification results and output probability of all categories are obtained by softmax classifier;Two kinds of single mode emotion recognition results are finally subjected to Decision-level fusion, using a kind of adaptive weighted method, obtain final emotion recognition result.The present invention is directed to the otherness of personal different modalities affective characteristics characterization ability in fact, takes adaptive weighting fusion method, has higher accuracy and objectivity.
Description
Technical field
The present invention relates to the emotion recognition fields in affection computation, and in particular to adaptive based on voice and human face expression
Weight bimodulus merges emotion identification method.
Background technique
In recent years, under the development of artificial intelligence and robot technology, traditional interactive mode is no longer satisfied
Demand, novel human-computer interaction need the exchange of emotion, and therefore, emotion recognition becomes the key of human-computer interaction technology development,
Also become the research topic of educational circles's hot spot.Emotion recognition is to be related to multi-disciplinary research topic, by making computer understanding simultaneously
It identifies human emotion, and then predicts and understand the behavior trend and psychological condition of the mankind, to realize efficiently harmonious man-machine feelings
Sense interaction.
The mood of people has various expression ways, such as voice, expression, posture, text, we can therefrom mention
Effective information is taken, thus Correct Analysis mood.And expression and voice messaging be the most obvious and the spy that most easily analyzes
Sign, is widely studied and applied.Psychologist Mehrabian gives formula: the words of display of emotion=7%
The facial expression of+38% sound+55%, it is seen that the voice messaging and human facial expression information of people cover 93% emotion information,
It is the core in Human communication's information.During emotion expression service, facial deformation can effectively and intuitively give expression to heart
Emotion, is mostly important one of the characteristic information of emotion recognition, and phonetic feature can similarly give expression to emotion abundant.
Due to the development of internet in recent years and emerging one after another for various social medias, the exchange way of people has been obtained very
Big abundant, such as video, audio etc. makes it possible multi-modal emotion recognition.There may be lists for traditional single mode identification
One affective characteristics cannot characterize the problem of affective state well, for example, people are in the sad emotion of expression, facial expression
It may not have a greater change, but at this point, the sad emotion lost can be told from droning and low and slow voice.Multi-modal knowledge
Do not make the information of different modalities complementation may be implemented, provides more emotion informations for emotion recognition, improve the standard of emotion recognition
True rate.But currently, single mode emotion recognition research is more mature, for multi-modal emotion identification method, there are also to be developed and complete
It is kind.Therefore, multi-modal emotion recognition has highly important practical application meaning.And it is special as expression the most dominant and voice
Sign, the bimodulus emotion recognition based on the two have important research significance and practical value.Traditional method of weighting has ignored a
People's otherness, therefore, it is necessary to a kind of methods of adaptive weighting to carry out weight distribution.
Summary of the invention
Emotion recognition is merged based on the adaptive weighting bimodulus of voice and human face expression the object of the present invention is to provide a kind of
Method to realize the complementation of each modal information, and realizes that the adaptive weighting for individual differences distributes.
For this purpose, the invention adopts the following technical scheme:
A kind of recognition methods based on the fusion of the adaptive weighting bimodal of voice and human face expression, which is characterized in that institute
State method the following steps are included:
S1, emotional speech and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample
This set test sample collection,
S2, speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data, is automatically extracted first
Expression peak value frame, obtain expression start the dynamic image sequence to expression peak value, after random length image sequence is normalized to
The image sequence of fixed length, as dynamic expression feature,
S3, speech emotional feature and expressive features are based respectively on, using the deep learning based on semi-supervised autocoder
Method is learnt, and obtains classification results and output probability of all categories by softmax classifier,
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, the side distributed using a kind of adaptive weighting
Method obtains final emotion recognition result.
Further, specific step is as follows by step S2 described above:
S2A.1: for speech emotional data, the speech samples section of acquisition is subjected to sub-frame processing, is divided into multiframe voice
Section, and windowing process is carried out to the voice segments after framing, speech emotional signal is obtained,
S2A.2: the speech emotional signal obtained for S2A.1 extracts low-level features and extracts in frame level, fundamental tone
F0, short-time energy, lock in phenomenon Shimmer, it is humorous make an uproar than and Mel cepstrum coefficient etc.,
S2A.3: the low-level features obtained to step 1 frame level are united in the speech samples level of multiframe composition
Meter is applied to multiple statistical functions, maximum value, minimum value, average value, standard deviation etc., obtains speech emotional feature;
S2B.1: for human face expression data, firstly, the human face expression characteristic point three-dimensional coordinate data that will acquire, is sat
Mark variation, by point centered on nose, obtain spin matrix using SVD principle, multiply spin matrix carry out it is rotationally-varying, with eliminate
The influence of head pose variation.
S2B.2: peak value expression frame is extracted using slow characteristic analysis method, the specific steps are as follows:
1) each dynamic image sequence sample is considered as time input signal
2) willIt is normalized, so that difference is 0, variance 1,
X (t)=[x1(t),x2(t),…,xI(t)]T;
3) input signal is subjected to nonlinear extensions extension, converts linear SFA problem for problem,
4) Data Whitening is carried out;
5) linear SFA method solves.
S2B.3: after obtaining expression start frame to the dynamic expression sequence of expression peak value frame, the non-fixed length of linear interpolation method is utilized
Behavioral characteristics be normalized.
Further, specific step is as follows by step S3 described above:
S3.1: being directed to a certain modal data, and input is without label and has label to input training sample, compiles by self-encoding encoder
Code, decoding and the output of softmax classifier generate reconstruct data and classification output respectively,
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification,
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification,
E (θ)=α Er+(1-α)Ec;
S3.4: gradient descent method undated parameter, until objective function is restrained.
Further, specific step is as follows by step S4 described above:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained, variable δ is calculatedk, δk
It can be used to measure the quality that the mode characterizes emotion, according to each sample δkDifferent adaptive points for realizing weight of size
Match, wherein J is the number of class in system, and P is the vector of sample output probability composition.P={ pj| j=1 ..., J }, pjFor
The output of softmax classifier belongs to probability of all categories, and d indicates the Euclidean distance between two vectors.
S4.2: by δkIt is mapped between [0,1] according to the following formula, as weight, wherein a and b is auto-selecting parameter, according to tool
Body situation determines.,
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, it is maximum
Probability generic is to identify classification.pj_kFor the jth kind classification for carrying out the acquisition of single mode emotion recognition using kth kind mode
Probability output, total K kind mode.
Compared with the existing technology, beneficial effects of the present invention are as follows: the present invention is based on the adaptive of voice and human face expression
The emotion identification method of weight bimodulus fusion is based on standard database and achieves more accurate and efficient recognition effect, for a
People's different modalities affective characteristics characterize the otherness of ability, take adaptive weighting fusion method, have higher accuracy
And objectivity, it is based on IEMOCAP emotion library, achieves 83% discrimination, distributes, is achieved about compared to traditional fixed weight
3% discrimination is promoted.
Detailed description of the invention
Fig. 1 is recognition methods overall procedure schematic diagram of the invention.
Fig. 2 is the flow diagram of step S3 of the present invention.
Fig. 3 is adaptive weighting allocation process diagram of the present invention.
Specific embodiment
Principles and features of the present invention are described with reference to the accompanying drawing, the given examples are served only to explain the present invention, and
It is non-to be used to limit the scope of the invention.
Embodiment 1: referring to figures 1-3, a kind of knowledge based on the fusion of the adaptive weighting bimodal of voice and human face expression
Other method, the described method comprises the following steps:
S1, emotional speech and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample
This set test sample collection,
S2, speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data, is automatically extracted first
Expression peak value frame, obtain expression start the dynamic image sequence to expression peak value, after random length image sequence is normalized to
The image sequence of fixed length, as dynamic expression feature,
S3, speech emotional feature and expressive features are based respectively on, using the deep learning based on semi-supervised autocoder
Method is learnt, and obtains classification results and output probability of all categories by softmax classifier,
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, the side distributed using a kind of adaptive weighting
Method obtains final emotion recognition result.
Further, specific step is as follows by step S2 described above:
S2A.1: for speech emotional data, the speech samples section of acquisition is subjected to sub-frame processing, is divided into multiframe voice
Section, and windowing process is carried out to the voice segments after framing, speech emotional signal is obtained,
S2A.2: the speech emotional signal obtained for S2A.1 extracts low-level features and extracts in frame level, fundamental tone
F0, short-time energy, lock in phenomenon Shimmer, it is humorous make an uproar than and Mel cepstrum coefficient etc.,
S2A.3: the low-level features obtained to step 1 frame level are united in the speech samples level of multiframe composition
Meter is applied to multiple statistical functions, maximum value, minimum value, average value, standard deviation etc., obtains speech emotional feature,
S2B.1: for human face expression data, firstly, the human face expression characteristic point three-dimensional coordinate data that will acquire, is sat
Mark variation, by point centered on nose, obtain spin matrix using SVD principle, multiply spin matrix carry out it is rotationally-varying, with eliminate
The influence of head pose variation.
S2B.2: peak value expression frame is extracted using slow characteristic analysis method, the specific steps are as follows:
1) each dynamic image sequence sample is considered as time input signal
2) willIt is normalized, so that difference is 0, variance 1,
X (t)=[x1(t),x2(t),…,xI(t)]T
3) input signal is subjected to nonlinear extensions extension, converts linear SFA problem for problem,
4) Data Whitening is carried out;
5) linear SFA method solves.
S2B.3: after obtaining expression start frame to the dynamic expression sequence of expression peak value frame, the non-fixed length of linear interpolation method is utilized
Behavioral characteristics be normalized.
Further, specific step is as follows by step S3 described above:
S3.1: being directed to a certain modal data, and input is without label and has label to input training sample, compiles by self-encoding encoder
Code, decoding and the output of softmax classifier generate reconstruct data and classification output respectively,
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification,
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification,
E (θ)=α Er+(1-α)Ec;
S3.4: gradient descent method undated parameter, until objective function is restrained.
Further, specific step is as follows by step S4 described above:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained, variable δ is calculatedk, δk
It can be used to measure the quality that the mode characterizes emotion, according to each sample δkDifferent adaptive points for realizing weight of size
Match, wherein J is the number of class in system.P is the vector of sample output probability composition.P={ pj| j=1 ..., J }, pjFor
The output of softmax classifier belongs to probability of all categories, and d indicates the Euclidean distance between two vectors.
S4.2: by δkIt is mapped between [0,1] according to the following formula, as weight, wherein a and b is auto-selecting parameter.
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, it is maximum
Probability generic is to identify classification.pj_kFor the jth kind classification for carrying out the acquisition of single mode emotion recognition using kth kind mode
Probability output, total K kind mode.
Application Example: referring to figures 1-3, using IEMOCAP affection data library as material, emulation platform is this example
MATLAB R2014a。
As shown in Figure 1, the present invention is based on the emotion identification method that voice and the adaptive weighting bimodulus of expression merge is main
The following steps are included:
S1, emotional speech and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample
This set test sample collection.Choose neutral, glad, sad, angry four class emotional categories.
S2, speech emotional feature is extracted to voice data.Dynamic expression feature is extracted to expression data, is automatically extracted first
Expression peak value frame, obtain expression start the dynamic image sequence to expression peak value, after random length image sequence is normalized to
The image sequence of fixed length, as dynamic expression feature.Extraction for phonetic feature is the speech feature extraction work using open source
Tool case openSMILE is extracted 2010 Paralinguistic Challenge standard feature collection of INTERSPEECH, and totally 1582
Dimensional feature.Extraction for human face expression behavioral characteristics.Peak value expression frame is extracted using slow characteristic analysis method.Given threshold afterwards
Expression start frame is found, it is non-fixed using linear interpolation method after obtaining expression start frame to the dynamic expression sequence of expression peak value frame
Long behavioral characteristics are normalized.
S3, speech emotional feature and expressive features are based respectively on, using the deep learning based on semi-supervised autocoder
Method is learnt, and obtains classification results and output probability of all categories by softmax classifier.
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, the side distributed using a kind of adaptive weighting
Method obtains final emotion recognition result.
As shown in Fig. 2, the step S3 semisupervised classification specific steps are as follows:
S3.1: it is directed to a certain modal data, input is without label and has label to input training sample.It is compiled by self-encoding encoder
Code, decoding and the output of softmax classifier generate reconstruct data and classification output respectively.
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification.
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification.
E (θ)=α Er+(1-α)Ec
S3.4: gradient descent method undated parameter, until objective function is restrained.
As shown in figure 3, specific step is as follows by the step S4:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained.Calculate variable δk, δk
It can be used to measure the quality that the mode characterizes emotion, according to each sample δkDifferent adaptive points for realizing weight of size
Match.Wherein, J is the number of class in system.P is the vector of sample output probability composition.P={ pj| j=1 ..., J }, pjFor
The output of softmax classifier belongs to probability of all categories, and d indicates the Euclidean distance between two vectors.
S4.2: by δkIt is mapped between [0,1] according to the following formula, as weight.Wherein, a and b is auto-selecting parameter.
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, it is maximum
Probability generic is to identify classification, pj_kFor the jth kind classification for carrying out the acquisition of single mode emotion recognition using kth kind mode
Probability output, total K kind mode.
It should be noted that above-described embodiment is only presently preferred embodiments of the present invention, there is no for the purpose of limiting the invention
Protection scope, the equivalent substitution or substitution made based on the above technical solution, all belongs to the scope of protection of the present invention.
Claims (5)
1. a kind of merge emotion identification method based on the adaptive weighting bimodal of voice and human face expression, which is characterized in that institute
The method of stating includes the following steps:
S1, emotional speech data and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample
This set test sample collection;
S2, speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data, automatically extracts expression first
Peak value frame obtains expression and starts the dynamic image sequence to expression peak value, after random length image sequence is normalized to fixed length
Image sequence, as dynamic expression feature;
S3, speech emotional feature and expressive features are based respectively on, using the deep learning method based on semi-supervised autocoder
Learnt, classification results and output probability of all categories are obtained by softmax classifier;
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, a kind of method distributed using adaptive weighting is obtained
To final emotion recognition result.
2. according to claim 1 merge emotion identification method based on the bimodal of voice and human face expression, feature exists
In the specific steps of the step S2 affective feature extraction are as follows:
S2A.1: for speech emotional data, carrying out sub-frame processing for the speech samples section of acquisition, be divided into multiframe voice segments, and
Windowing process is carried out to the voice segments after framing, obtains speech emotional signal,
S2A.2: the speech emotional signal obtained for S2A.1 extracts low-level features and extracts in frame level, fundamental tone F0, short
Shi Nengliang, lock in phenomenon Shimmer, it is humorous make an uproar than and Mel cepstrum coefficient etc.,
S2A.3: the low-level features obtained to step 1 frame level are counted in the speech samples level of multiframe composition,
Multiple statistical functions, maximum value, minimum value, average value, standard deviation etc. are applied to, speech emotional feature is obtained;
S2B.1: for human face expression data, firstly, the human face expression characteristic point three-dimensional coordinate data that will acquire, carries out coordinate change
Change, by point centered on nose, obtain spin matrix using SVD principle, multiply spin matrix carry out it is rotationally-varying, to eliminate head
The influence of attitudes vibration,
S2B.2: extracting peak value expression frame using slow characteristic analysis method,
S2B.3: random length dynamic using linear interpolation method after obtaining expression start frame to the dynamic expression sequence of expression peak value frame
State feature is normalized.
3. according to claim 1 merge emotion identification method based on the bimodal of voice and human face expression, feature exists
In the specific steps of the step S3 semi-supervised learning are as follows:
S3.1: being directed to a certain modal data, and input is without label and has label to input training sample, encodes by self-encoding encoder, solves
Code and the output of softmax classifier generate reconstruct data and classification output respectively,
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification,
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification,
E (θ)=α Er+(1-α)Ec;
S3.4: gradient descent method undated parameter, until objective function is restrained.
4. according to claim 1 merge emotion recognition side based on the adaptive weighting bimodal of voice and human face expression
Method, which is characterized in that Decision-level fusion step of the step S4 based on adaptive weighting are as follows:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained, variable δ is calculatedk, δkIt can use
The quality that the mode characterizes emotion is measured, according to each sample δkThe different self-adjusted blocks for realizing weight of size,
In, J is the number of class in system, and P is the vector of sample output probability composition, P={ pj| j=1 ..., J }, pjIt is softmax points
The output of class device belongs to probability of all categories, and d indicates the Euclidean distance between two vectors;
S4.2: by δkIt being mapped between [0,1] according to the following formula, as weight, wherein a and b is auto-selecting parameter,
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, maximum probability
Generic is to identify classification, pj_kIt is other general using the jth type of kth kind mode progress single mode emotion recognition acquisition
Rate output, total K kind mode;
5. according to claim 2 merge emotion recognition side based on the adaptive weighting bimodal of voice and human face expression
Method, which is characterized in that S2B.2: peak value expression frame is extracted using slow characteristic analysis method, the specific steps are as follows:
1) each dynamic image sequence sample is considered as time input signal
2) willIt is normalized, so that difference is 0, variance 1,
X (t)=[x1(t),x2(t),…,xI(t)]T
3) input signal is subjected to nonlinear extensions extension, converts linear SFA problem for problem,
4) Data Whitening is carried out;
5) linear SFA method solves.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910632006.3A CN110516696B (en) | 2019-07-12 | 2019-07-12 | Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910632006.3A CN110516696B (en) | 2019-07-12 | 2019-07-12 | Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110516696A true CN110516696A (en) | 2019-11-29 |
CN110516696B CN110516696B (en) | 2023-07-25 |
Family
ID=68623425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910632006.3A Active CN110516696B (en) | 2019-07-12 | 2019-07-12 | Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110516696B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027215A (en) * | 2019-12-11 | 2020-04-17 | 中国人民解放军陆军工程大学 | Character training system and method for virtual human |
CN111401268A (en) * | 2020-03-19 | 2020-07-10 | 内蒙古工业大学 | Multi-mode emotion recognition method and device for open environment |
CN111460494A (en) * | 2020-03-24 | 2020-07-28 | 广州大学 | Multi-mode deep learning-oriented privacy protection method and system |
CN112006697A (en) * | 2020-06-02 | 2020-12-01 | 东南大学 | Gradient boosting decision tree depression recognition method based on voice signals |
CN112101096A (en) * | 2020-08-02 | 2020-12-18 | 华南理工大学 | Suicide emotion perception method based on multi-mode fusion of voice and micro-expression |
CN112401886A (en) * | 2020-10-22 | 2021-02-26 | 北京大学 | Processing method, device and equipment for emotion recognition and storage medium |
CN112418034A (en) * | 2020-11-12 | 2021-02-26 | 元梦人文智能国际有限公司 | Multi-modal emotion recognition method and device, electronic equipment and storage medium |
CN112528835A (en) * | 2020-12-08 | 2021-03-19 | 北京百度网讯科技有限公司 | Training method, recognition method and device of expression prediction model and electronic equipment |
CN113033450A (en) * | 2021-04-02 | 2021-06-25 | 山东大学 | Multi-mode continuous emotion recognition method, service inference method and system |
CN113076847A (en) * | 2021-03-29 | 2021-07-06 | 济南大学 | Multi-mode emotion recognition method and system |
CN113343860A (en) * | 2021-06-10 | 2021-09-03 | 南京工业大学 | Bimodal fusion emotion recognition method based on video image and voice |
CN113780198A (en) * | 2021-09-15 | 2021-12-10 | 南京邮电大学 | Multi-mode emotion classification method for image generation |
JP2022526148A (en) * | 2019-09-18 | 2022-05-23 | ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド | Video generation methods, devices, electronic devices and computer storage media |
CN114626430A (en) * | 2021-12-30 | 2022-06-14 | 华院计算技术(上海)股份有限公司 | Emotion recognition model training method, emotion recognition device and emotion recognition medium |
CN114912502A (en) * | 2021-12-28 | 2022-08-16 | 天翼数字生活科技有限公司 | Bimodal deep semi-supervised emotion classification method based on expressions and voices |
CN115240649A (en) * | 2022-07-19 | 2022-10-25 | 于振华 | Voice recognition method and system based on deep learning |
CN116561533A (en) * | 2023-07-05 | 2023-08-08 | 福建天晴数码有限公司 | Emotion evolution method and terminal for virtual avatar in educational element universe |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105976809A (en) * | 2016-05-25 | 2016-09-28 | 中国地质大学(武汉) | Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion |
-
2019
- 2019-07-12 CN CN201910632006.3A patent/CN110516696B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105976809A (en) * | 2016-05-25 | 2016-09-28 | 中国地质大学(武汉) | Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022526148A (en) * | 2019-09-18 | 2022-05-23 | ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド | Video generation methods, devices, electronic devices and computer storage media |
CN111027215B (en) * | 2019-12-11 | 2024-02-20 | 中国人民解放军陆军工程大学 | Character training system and method for virtual person |
CN111027215A (en) * | 2019-12-11 | 2020-04-17 | 中国人民解放军陆军工程大学 | Character training system and method for virtual human |
CN111401268A (en) * | 2020-03-19 | 2020-07-10 | 内蒙古工业大学 | Multi-mode emotion recognition method and device for open environment |
CN111460494A (en) * | 2020-03-24 | 2020-07-28 | 广州大学 | Multi-mode deep learning-oriented privacy protection method and system |
CN111460494B (en) * | 2020-03-24 | 2023-04-07 | 广州大学 | Multi-mode deep learning-oriented privacy protection method and system |
CN112006697A (en) * | 2020-06-02 | 2020-12-01 | 东南大学 | Gradient boosting decision tree depression recognition method based on voice signals |
CN112101096A (en) * | 2020-08-02 | 2020-12-18 | 华南理工大学 | Suicide emotion perception method based on multi-mode fusion of voice and micro-expression |
CN112101096B (en) * | 2020-08-02 | 2023-09-22 | 华南理工大学 | Multi-mode fusion suicide emotion perception method based on voice and micro-expression |
CN112401886B (en) * | 2020-10-22 | 2023-01-31 | 北京大学 | Processing method, device and equipment for emotion recognition and storage medium |
CN112401886A (en) * | 2020-10-22 | 2021-02-26 | 北京大学 | Processing method, device and equipment for emotion recognition and storage medium |
CN112418034A (en) * | 2020-11-12 | 2021-02-26 | 元梦人文智能国际有限公司 | Multi-modal emotion recognition method and device, electronic equipment and storage medium |
CN112528835A (en) * | 2020-12-08 | 2021-03-19 | 北京百度网讯科技有限公司 | Training method, recognition method and device of expression prediction model and electronic equipment |
CN112528835B (en) * | 2020-12-08 | 2023-07-04 | 北京百度网讯科技有限公司 | Training method and device of expression prediction model, recognition method and device and electronic equipment |
CN113076847B (en) * | 2021-03-29 | 2022-06-17 | 济南大学 | Multi-mode emotion recognition method and system |
CN113076847A (en) * | 2021-03-29 | 2021-07-06 | 济南大学 | Multi-mode emotion recognition method and system |
CN113033450A (en) * | 2021-04-02 | 2021-06-25 | 山东大学 | Multi-mode continuous emotion recognition method, service inference method and system |
CN113343860A (en) * | 2021-06-10 | 2021-09-03 | 南京工业大学 | Bimodal fusion emotion recognition method based on video image and voice |
CN113780198B (en) * | 2021-09-15 | 2023-11-24 | 南京邮电大学 | Multi-mode emotion classification method for image generation |
CN113780198A (en) * | 2021-09-15 | 2021-12-10 | 南京邮电大学 | Multi-mode emotion classification method for image generation |
CN114912502A (en) * | 2021-12-28 | 2022-08-16 | 天翼数字生活科技有限公司 | Bimodal deep semi-supervised emotion classification method based on expressions and voices |
CN114912502B (en) * | 2021-12-28 | 2024-03-29 | 天翼数字生活科技有限公司 | Double-mode deep semi-supervised emotion classification method based on expressions and voices |
CN114626430B (en) * | 2021-12-30 | 2022-10-18 | 华院计算技术(上海)股份有限公司 | Emotion recognition model training method, emotion recognition device and emotion recognition medium |
CN114626430A (en) * | 2021-12-30 | 2022-06-14 | 华院计算技术(上海)股份有限公司 | Emotion recognition model training method, emotion recognition device and emotion recognition medium |
CN115240649A (en) * | 2022-07-19 | 2022-10-25 | 于振华 | Voice recognition method and system based on deep learning |
CN116561533A (en) * | 2023-07-05 | 2023-08-08 | 福建天晴数码有限公司 | Emotion evolution method and terminal for virtual avatar in educational element universe |
CN116561533B (en) * | 2023-07-05 | 2023-09-29 | 福建天晴数码有限公司 | Emotion evolution method and terminal for virtual avatar in educational element universe |
Also Published As
Publication number | Publication date |
---|---|
CN110516696B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110516696A (en) | It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression | |
CN110556129B (en) | Bimodal emotion recognition model training method and bimodal emotion recognition method | |
CN106250855A (en) | A kind of multi-modal emotion identification method based on Multiple Kernel Learning | |
Busso et al. | Iterative feature normalization scheme for automatic emotion detection from speech | |
Bhat et al. | Automatic assessment of sentence-level dysarthria intelligibility using BLSTM | |
CN103366618B (en) | Scene device for Chinese learning training based on artificial intelligence and virtual reality | |
Chao et al. | Multi task sequence learning for depression scale prediction from video | |
He et al. | Multimodal depression recognition with dynamic visual and audio cues | |
Tian et al. | Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features | |
CN108899049A (en) | A kind of speech-emotion recognition method and system based on convolutional neural networks | |
CN110147548A (en) | The emotion identification method initialized based on bidirectional valve controlled cycling element network and new network | |
Noroozi et al. | Supervised vocal-based emotion recognition using multiclass support vector machine, random forests, and adaboost | |
CN110534133A (en) | A kind of speech emotion recognition system and speech-emotion recognition method | |
Elshaer et al. | Transfer learning from sound representations for anger detection in speech | |
CN110289002A (en) | A kind of speaker clustering method and system end to end | |
CN109065073A (en) | Speech-emotion recognition method based on depth S VM network model | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
Huang et al. | Speech emotion recognition using convolutional neural network with audio word-based embedding | |
CN114898779A (en) | Multi-mode fused speech emotion recognition method and system | |
Jiang et al. | Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit. | |
Rajarajeswari et al. | An executable method for an intelligent speech and call recognition system using a machine learning-based approach | |
CN111090726A (en) | NLP-based electric power industry character customer service interaction method | |
Ling | An acoustic model for English speech recognition based on deep learning | |
CN116434786A (en) | Text-semantic-assisted teacher voice emotion recognition method | |
Lan et al. | Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |