CN110516696A - It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression - Google Patents

It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression Download PDF

Info

Publication number
CN110516696A
CN110516696A CN201910632006.3A CN201910632006A CN110516696A CN 110516696 A CN110516696 A CN 110516696A CN 201910632006 A CN201910632006 A CN 201910632006A CN 110516696 A CN110516696 A CN 110516696A
Authority
CN
China
Prior art keywords
expression
data
voice
feature
human face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910632006.3A
Other languages
Chinese (zh)
Other versions
CN110516696B (en
Inventor
肖婧
黄永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910632006.3A priority Critical patent/CN110516696B/en
Publication of CN110516696A publication Critical patent/CN110516696A/en
Application granted granted Critical
Publication of CN110516696B publication Critical patent/CN110516696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Emotion identification method is merged based on the adaptive weighting bimodal of voice and human face expression the present invention relates to a kind of, the following steps are included: obtaining emotional speech and human face expression data, affection data is corresponding with emotional category, and choose training sample set test sample collection;Speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data;Speech emotional feature and expressive features are based respectively on, are learnt using the deep learning method based on semi-supervised autocoder, classification results and output probability of all categories are obtained by softmax classifier;Two kinds of single mode emotion recognition results are finally subjected to Decision-level fusion, using a kind of adaptive weighted method, obtain final emotion recognition result.The present invention is directed to the otherness of personal different modalities affective characteristics characterization ability in fact, takes adaptive weighting fusion method, has higher accuracy and objectivity.

Description

It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression
Technical field
The present invention relates to the emotion recognition fields in affection computation, and in particular to adaptive based on voice and human face expression Weight bimodulus merges emotion identification method.
Background technique
In recent years, under the development of artificial intelligence and robot technology, traditional interactive mode is no longer satisfied Demand, novel human-computer interaction need the exchange of emotion, and therefore, emotion recognition becomes the key of human-computer interaction technology development, Also become the research topic of educational circles's hot spot.Emotion recognition is to be related to multi-disciplinary research topic, by making computer understanding simultaneously It identifies human emotion, and then predicts and understand the behavior trend and psychological condition of the mankind, to realize efficiently harmonious man-machine feelings Sense interaction.
The mood of people has various expression ways, such as voice, expression, posture, text, we can therefrom mention Effective information is taken, thus Correct Analysis mood.And expression and voice messaging be the most obvious and the spy that most easily analyzes Sign, is widely studied and applied.Psychologist Mehrabian gives formula: the words of display of emotion=7% The facial expression of+38% sound+55%, it is seen that the voice messaging and human facial expression information of people cover 93% emotion information, It is the core in Human communication's information.During emotion expression service, facial deformation can effectively and intuitively give expression to heart Emotion, is mostly important one of the characteristic information of emotion recognition, and phonetic feature can similarly give expression to emotion abundant.
Due to the development of internet in recent years and emerging one after another for various social medias, the exchange way of people has been obtained very Big abundant, such as video, audio etc. makes it possible multi-modal emotion recognition.There may be lists for traditional single mode identification One affective characteristics cannot characterize the problem of affective state well, for example, people are in the sad emotion of expression, facial expression It may not have a greater change, but at this point, the sad emotion lost can be told from droning and low and slow voice.Multi-modal knowledge Do not make the information of different modalities complementation may be implemented, provides more emotion informations for emotion recognition, improve the standard of emotion recognition True rate.But currently, single mode emotion recognition research is more mature, for multi-modal emotion identification method, there are also to be developed and complete It is kind.Therefore, multi-modal emotion recognition has highly important practical application meaning.And it is special as expression the most dominant and voice Sign, the bimodulus emotion recognition based on the two have important research significance and practical value.Traditional method of weighting has ignored a People's otherness, therefore, it is necessary to a kind of methods of adaptive weighting to carry out weight distribution.
Summary of the invention
Emotion recognition is merged based on the adaptive weighting bimodulus of voice and human face expression the object of the present invention is to provide a kind of Method to realize the complementation of each modal information, and realizes that the adaptive weighting for individual differences distributes.
For this purpose, the invention adopts the following technical scheme:
A kind of recognition methods based on the fusion of the adaptive weighting bimodal of voice and human face expression, which is characterized in that institute State method the following steps are included:
S1, emotional speech and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample This set test sample collection,
S2, speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data, is automatically extracted first Expression peak value frame, obtain expression start the dynamic image sequence to expression peak value, after random length image sequence is normalized to The image sequence of fixed length, as dynamic expression feature,
S3, speech emotional feature and expressive features are based respectively on, using the deep learning based on semi-supervised autocoder Method is learnt, and obtains classification results and output probability of all categories by softmax classifier,
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, the side distributed using a kind of adaptive weighting Method obtains final emotion recognition result.
Further, specific step is as follows by step S2 described above:
S2A.1: for speech emotional data, the speech samples section of acquisition is subjected to sub-frame processing, is divided into multiframe voice Section, and windowing process is carried out to the voice segments after framing, speech emotional signal is obtained,
S2A.2: the speech emotional signal obtained for S2A.1 extracts low-level features and extracts in frame level, fundamental tone F0, short-time energy, lock in phenomenon Shimmer, it is humorous make an uproar than and Mel cepstrum coefficient etc.,
S2A.3: the low-level features obtained to step 1 frame level are united in the speech samples level of multiframe composition Meter is applied to multiple statistical functions, maximum value, minimum value, average value, standard deviation etc., obtains speech emotional feature;
S2B.1: for human face expression data, firstly, the human face expression characteristic point three-dimensional coordinate data that will acquire, is sat Mark variation, by point centered on nose, obtain spin matrix using SVD principle, multiply spin matrix carry out it is rotationally-varying, with eliminate The influence of head pose variation.
S2B.2: peak value expression frame is extracted using slow characteristic analysis method, the specific steps are as follows:
1) each dynamic image sequence sample is considered as time input signal
2) willIt is normalized, so that difference is 0, variance 1,
X (t)=[x1(t),x2(t),…,xI(t)]T
3) input signal is subjected to nonlinear extensions extension, converts linear SFA problem for problem,
4) Data Whitening is carried out;
5) linear SFA method solves.
S2B.3: after obtaining expression start frame to the dynamic expression sequence of expression peak value frame, the non-fixed length of linear interpolation method is utilized Behavioral characteristics be normalized.
Further, specific step is as follows by step S3 described above:
S3.1: being directed to a certain modal data, and input is without label and has label to input training sample, compiles by self-encoding encoder Code, decoding and the output of softmax classifier generate reconstruct data and classification output respectively,
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification,
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification,
E (θ)=α Er+(1-α)Ec
S3.4: gradient descent method undated parameter, until objective function is restrained.
Further, specific step is as follows by step S4 described above:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained, variable δ is calculatedk, δk It can be used to measure the quality that the mode characterizes emotion, according to each sample δkDifferent adaptive points for realizing weight of size Match, wherein J is the number of class in system, and P is the vector of sample output probability composition.P={ pj| j=1 ..., J }, pjFor The output of softmax classifier belongs to probability of all categories, and d indicates the Euclidean distance between two vectors.
S4.2: by δkIt is mapped between [0,1] according to the following formula, as weight, wherein a and b is auto-selecting parameter, according to tool Body situation determines.,
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, it is maximum Probability generic is to identify classification.pj_kFor the jth kind classification for carrying out the acquisition of single mode emotion recognition using kth kind mode Probability output, total K kind mode.
Compared with the existing technology, beneficial effects of the present invention are as follows: the present invention is based on the adaptive of voice and human face expression The emotion identification method of weight bimodulus fusion is based on standard database and achieves more accurate and efficient recognition effect, for a People's different modalities affective characteristics characterize the otherness of ability, take adaptive weighting fusion method, have higher accuracy And objectivity, it is based on IEMOCAP emotion library, achieves 83% discrimination, distributes, is achieved about compared to traditional fixed weight 3% discrimination is promoted.
Detailed description of the invention
Fig. 1 is recognition methods overall procedure schematic diagram of the invention.
Fig. 2 is the flow diagram of step S3 of the present invention.
Fig. 3 is adaptive weighting allocation process diagram of the present invention.
Specific embodiment
Principles and features of the present invention are described with reference to the accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the invention.
Embodiment 1: referring to figures 1-3, a kind of knowledge based on the fusion of the adaptive weighting bimodal of voice and human face expression Other method, the described method comprises the following steps:
S1, emotional speech and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample This set test sample collection,
S2, speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data, is automatically extracted first Expression peak value frame, obtain expression start the dynamic image sequence to expression peak value, after random length image sequence is normalized to The image sequence of fixed length, as dynamic expression feature,
S3, speech emotional feature and expressive features are based respectively on, using the deep learning based on semi-supervised autocoder Method is learnt, and obtains classification results and output probability of all categories by softmax classifier,
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, the side distributed using a kind of adaptive weighting Method obtains final emotion recognition result.
Further, specific step is as follows by step S2 described above:
S2A.1: for speech emotional data, the speech samples section of acquisition is subjected to sub-frame processing, is divided into multiframe voice Section, and windowing process is carried out to the voice segments after framing, speech emotional signal is obtained,
S2A.2: the speech emotional signal obtained for S2A.1 extracts low-level features and extracts in frame level, fundamental tone F0, short-time energy, lock in phenomenon Shimmer, it is humorous make an uproar than and Mel cepstrum coefficient etc.,
S2A.3: the low-level features obtained to step 1 frame level are united in the speech samples level of multiframe composition Meter is applied to multiple statistical functions, maximum value, minimum value, average value, standard deviation etc., obtains speech emotional feature,
S2B.1: for human face expression data, firstly, the human face expression characteristic point three-dimensional coordinate data that will acquire, is sat Mark variation, by point centered on nose, obtain spin matrix using SVD principle, multiply spin matrix carry out it is rotationally-varying, with eliminate The influence of head pose variation.
S2B.2: peak value expression frame is extracted using slow characteristic analysis method, the specific steps are as follows:
1) each dynamic image sequence sample is considered as time input signal
2) willIt is normalized, so that difference is 0, variance 1,
X (t)=[x1(t),x2(t),…,xI(t)]T
3) input signal is subjected to nonlinear extensions extension, converts linear SFA problem for problem,
4) Data Whitening is carried out;
5) linear SFA method solves.
S2B.3: after obtaining expression start frame to the dynamic expression sequence of expression peak value frame, the non-fixed length of linear interpolation method is utilized Behavioral characteristics be normalized.
Further, specific step is as follows by step S3 described above:
S3.1: being directed to a certain modal data, and input is without label and has label to input training sample, compiles by self-encoding encoder Code, decoding and the output of softmax classifier generate reconstruct data and classification output respectively,
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification,
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification,
E (θ)=α Er+(1-α)Ec
S3.4: gradient descent method undated parameter, until objective function is restrained.
Further, specific step is as follows by step S4 described above:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained, variable δ is calculatedk, δk It can be used to measure the quality that the mode characterizes emotion, according to each sample δkDifferent adaptive points for realizing weight of size Match, wherein J is the number of class in system.P is the vector of sample output probability composition.P={ pj| j=1 ..., J }, pjFor The output of softmax classifier belongs to probability of all categories, and d indicates the Euclidean distance between two vectors.
S4.2: by δkIt is mapped between [0,1] according to the following formula, as weight, wherein a and b is auto-selecting parameter.
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, it is maximum Probability generic is to identify classification.pj_kFor the jth kind classification for carrying out the acquisition of single mode emotion recognition using kth kind mode Probability output, total K kind mode.
Application Example: referring to figures 1-3, using IEMOCAP affection data library as material, emulation platform is this example MATLAB R2014a。
As shown in Figure 1, the present invention is based on the emotion identification method that voice and the adaptive weighting bimodulus of expression merge is main The following steps are included:
S1, emotional speech and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample This set test sample collection.Choose neutral, glad, sad, angry four class emotional categories.
S2, speech emotional feature is extracted to voice data.Dynamic expression feature is extracted to expression data, is automatically extracted first Expression peak value frame, obtain expression start the dynamic image sequence to expression peak value, after random length image sequence is normalized to The image sequence of fixed length, as dynamic expression feature.Extraction for phonetic feature is the speech feature extraction work using open source Tool case openSMILE is extracted 2010 Paralinguistic Challenge standard feature collection of INTERSPEECH, and totally 1582 Dimensional feature.Extraction for human face expression behavioral characteristics.Peak value expression frame is extracted using slow characteristic analysis method.Given threshold afterwards Expression start frame is found, it is non-fixed using linear interpolation method after obtaining expression start frame to the dynamic expression sequence of expression peak value frame Long behavioral characteristics are normalized.
S3, speech emotional feature and expressive features are based respectively on, using the deep learning based on semi-supervised autocoder Method is learnt, and obtains classification results and output probability of all categories by softmax classifier.
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, the side distributed using a kind of adaptive weighting Method obtains final emotion recognition result.
As shown in Fig. 2, the step S3 semisupervised classification specific steps are as follows:
S3.1: it is directed to a certain modal data, input is without label and has label to input training sample.It is compiled by self-encoding encoder Code, decoding and the output of softmax classifier generate reconstruct data and classification output respectively.
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification.
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification.
E (θ)=α Er+(1-α)Ec
S3.4: gradient descent method undated parameter, until objective function is restrained.
As shown in figure 3, specific step is as follows by the step S4:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained.Calculate variable δk, δk It can be used to measure the quality that the mode characterizes emotion, according to each sample δkDifferent adaptive points for realizing weight of size Match.Wherein, J is the number of class in system.P is the vector of sample output probability composition.P={ pj| j=1 ..., J }, pjFor The output of softmax classifier belongs to probability of all categories, and d indicates the Euclidean distance between two vectors.
S4.2: by δkIt is mapped between [0,1] according to the following formula, as weight.Wherein, a and b is auto-selecting parameter.
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, it is maximum Probability generic is to identify classification, pj_kFor the jth kind classification for carrying out the acquisition of single mode emotion recognition using kth kind mode Probability output, total K kind mode.
It should be noted that above-described embodiment is only presently preferred embodiments of the present invention, there is no for the purpose of limiting the invention Protection scope, the equivalent substitution or substitution made based on the above technical solution, all belongs to the scope of protection of the present invention.

Claims (5)

1. a kind of merge emotion identification method based on the adaptive weighting bimodal of voice and human face expression, which is characterized in that institute The method of stating includes the following steps:
S1, emotional speech data and human face expression data are obtained, affection data is corresponding with emotional category, and choose trained sample This set test sample collection;
S2, speech emotional feature is extracted to voice data, dynamic expression feature is extracted to expression data, automatically extracts expression first Peak value frame obtains expression and starts the dynamic image sequence to expression peak value, after random length image sequence is normalized to fixed length Image sequence, as dynamic expression feature;
S3, speech emotional feature and expressive features are based respectively on, using the deep learning method based on semi-supervised autocoder Learnt, classification results and output probability of all categories are obtained by softmax classifier;
S4, two kinds of single mode emotion recognition results are subjected to Decision-level fusion, a kind of method distributed using adaptive weighting is obtained To final emotion recognition result.
2. according to claim 1 merge emotion identification method based on the bimodal of voice and human face expression, feature exists In the specific steps of the step S2 affective feature extraction are as follows:
S2A.1: for speech emotional data, carrying out sub-frame processing for the speech samples section of acquisition, be divided into multiframe voice segments, and Windowing process is carried out to the voice segments after framing, obtains speech emotional signal,
S2A.2: the speech emotional signal obtained for S2A.1 extracts low-level features and extracts in frame level, fundamental tone F0, short Shi Nengliang, lock in phenomenon Shimmer, it is humorous make an uproar than and Mel cepstrum coefficient etc.,
S2A.3: the low-level features obtained to step 1 frame level are counted in the speech samples level of multiframe composition, Multiple statistical functions, maximum value, minimum value, average value, standard deviation etc. are applied to, speech emotional feature is obtained;
S2B.1: for human face expression data, firstly, the human face expression characteristic point three-dimensional coordinate data that will acquire, carries out coordinate change Change, by point centered on nose, obtain spin matrix using SVD principle, multiply spin matrix carry out it is rotationally-varying, to eliminate head The influence of attitudes vibration,
S2B.2: extracting peak value expression frame using slow characteristic analysis method,
S2B.3: random length dynamic using linear interpolation method after obtaining expression start frame to the dynamic expression sequence of expression peak value frame State feature is normalized.
3. according to claim 1 merge emotion identification method based on the bimodal of voice and human face expression, feature exists In the specific steps of the step S3 semi-supervised learning are as follows:
S3.1: being directed to a certain modal data, and input is without label and has label to input training sample, encodes by self-encoding encoder, solves Code and the output of softmax classifier generate reconstruct data and classification output respectively,
S3.2: calculating unsupervised learning indicates reconstructed error and supervised learning error in classification,
S3.3: constitution optimization objective function, while considering reconstructed error and error in classification,
E (θ)=α Er+(1-α)Ec
S3.4: gradient descent method undated parameter, until objective function is restrained.
4. according to claim 1 merge emotion recognition side based on the adaptive weighting bimodal of voice and human face expression Method, which is characterized in that Decision-level fusion step of the step S4 based on adaptive weighting are as follows:
S4.1: all kinds of output probabilities of softmax classifier test sample both modalities which respectively are obtained, variable δ is calculatedk, δkIt can use The quality that the mode characterizes emotion is measured, according to each sample δkThe different self-adjusted blocks for realizing weight of size, In, J is the number of class in system, and P is the vector of sample output probability composition, P={ pj| j=1 ..., J }, pjIt is softmax points The output of class device belongs to probability of all categories, and d indicates the Euclidean distance between two vectors;
S4.2: by δkIt being mapped between [0,1] according to the following formula, as weight, wherein a and b is auto-selecting parameter,
S4.3: P in fused output probability vector is obtained according to the following formulafinal={ pfinal_j| j=1 ..., J }, maximum probability Generic is to identify classification, pj_kIt is other general using the jth type of kth kind mode progress single mode emotion recognition acquisition Rate output, total K kind mode;
5. according to claim 2 merge emotion recognition side based on the adaptive weighting bimodal of voice and human face expression Method, which is characterized in that S2B.2: peak value expression frame is extracted using slow characteristic analysis method, the specific steps are as follows:
1) each dynamic image sequence sample is considered as time input signal
2) willIt is normalized, so that difference is 0, variance 1,
X (t)=[x1(t),x2(t),…,xI(t)]T
3) input signal is subjected to nonlinear extensions extension, converts linear SFA problem for problem,
4) Data Whitening is carried out;
5) linear SFA method solves.
CN201910632006.3A 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression Active CN110516696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910632006.3A CN110516696B (en) 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910632006.3A CN110516696B (en) 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression

Publications (2)

Publication Number Publication Date
CN110516696A true CN110516696A (en) 2019-11-29
CN110516696B CN110516696B (en) 2023-07-25

Family

ID=68623425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910632006.3A Active CN110516696B (en) 2019-07-12 2019-07-12 Self-adaptive weight bimodal fusion emotion recognition method based on voice and expression

Country Status (1)

Country Link
CN (1) CN110516696B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027215A (en) * 2019-12-11 2020-04-17 中国人民解放军陆军工程大学 Character training system and method for virtual human
CN111401268A (en) * 2020-03-19 2020-07-10 内蒙古工业大学 Multi-mode emotion recognition method and device for open environment
CN111460494A (en) * 2020-03-24 2020-07-28 广州大学 Multi-mode deep learning-oriented privacy protection method and system
CN112006697A (en) * 2020-06-02 2020-12-01 东南大学 Gradient boosting decision tree depression recognition method based on voice signals
CN112101096A (en) * 2020-08-02 2020-12-18 华南理工大学 Suicide emotion perception method based on multi-mode fusion of voice and micro-expression
CN112401886A (en) * 2020-10-22 2021-02-26 北京大学 Processing method, device and equipment for emotion recognition and storage medium
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112528835A (en) * 2020-12-08 2021-03-19 北京百度网讯科技有限公司 Training method, recognition method and device of expression prediction model and electronic equipment
CN113033450A (en) * 2021-04-02 2021-06-25 山东大学 Multi-mode continuous emotion recognition method, service inference method and system
CN113076847A (en) * 2021-03-29 2021-07-06 济南大学 Multi-mode emotion recognition method and system
CN113343860A (en) * 2021-06-10 2021-09-03 南京工业大学 Bimodal fusion emotion recognition method based on video image and voice
CN113780198A (en) * 2021-09-15 2021-12-10 南京邮电大学 Multi-mode emotion classification method for image generation
JP2022526148A (en) * 2019-09-18 2022-05-23 ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド Video generation methods, devices, electronic devices and computer storage media
CN114626430A (en) * 2021-12-30 2022-06-14 华院计算技术(上海)股份有限公司 Emotion recognition model training method, emotion recognition device and emotion recognition medium
CN114912502A (en) * 2021-12-28 2022-08-16 天翼数字生活科技有限公司 Bimodal deep semi-supervised emotion classification method based on expressions and voices
CN115240649A (en) * 2022-07-19 2022-10-25 于振华 Voice recognition method and system based on deep learning
CN116561533A (en) * 2023-07-05 2023-08-08 福建天晴数码有限公司 Emotion evolution method and terminal for virtual avatar in educational element universe

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022526148A (en) * 2019-09-18 2022-05-23 ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド Video generation methods, devices, electronic devices and computer storage media
CN111027215B (en) * 2019-12-11 2024-02-20 中国人民解放军陆军工程大学 Character training system and method for virtual person
CN111027215A (en) * 2019-12-11 2020-04-17 中国人民解放军陆军工程大学 Character training system and method for virtual human
CN111401268A (en) * 2020-03-19 2020-07-10 内蒙古工业大学 Multi-mode emotion recognition method and device for open environment
CN111460494A (en) * 2020-03-24 2020-07-28 广州大学 Multi-mode deep learning-oriented privacy protection method and system
CN111460494B (en) * 2020-03-24 2023-04-07 广州大学 Multi-mode deep learning-oriented privacy protection method and system
CN112006697A (en) * 2020-06-02 2020-12-01 东南大学 Gradient boosting decision tree depression recognition method based on voice signals
CN112101096A (en) * 2020-08-02 2020-12-18 华南理工大学 Suicide emotion perception method based on multi-mode fusion of voice and micro-expression
CN112101096B (en) * 2020-08-02 2023-09-22 华南理工大学 Multi-mode fusion suicide emotion perception method based on voice and micro-expression
CN112401886B (en) * 2020-10-22 2023-01-31 北京大学 Processing method, device and equipment for emotion recognition and storage medium
CN112401886A (en) * 2020-10-22 2021-02-26 北京大学 Processing method, device and equipment for emotion recognition and storage medium
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112528835A (en) * 2020-12-08 2021-03-19 北京百度网讯科技有限公司 Training method, recognition method and device of expression prediction model and electronic equipment
CN112528835B (en) * 2020-12-08 2023-07-04 北京百度网讯科技有限公司 Training method and device of expression prediction model, recognition method and device and electronic equipment
CN113076847B (en) * 2021-03-29 2022-06-17 济南大学 Multi-mode emotion recognition method and system
CN113076847A (en) * 2021-03-29 2021-07-06 济南大学 Multi-mode emotion recognition method and system
CN113033450A (en) * 2021-04-02 2021-06-25 山东大学 Multi-mode continuous emotion recognition method, service inference method and system
CN113343860A (en) * 2021-06-10 2021-09-03 南京工业大学 Bimodal fusion emotion recognition method based on video image and voice
CN113780198B (en) * 2021-09-15 2023-11-24 南京邮电大学 Multi-mode emotion classification method for image generation
CN113780198A (en) * 2021-09-15 2021-12-10 南京邮电大学 Multi-mode emotion classification method for image generation
CN114912502A (en) * 2021-12-28 2022-08-16 天翼数字生活科技有限公司 Bimodal deep semi-supervised emotion classification method based on expressions and voices
CN114912502B (en) * 2021-12-28 2024-03-29 天翼数字生活科技有限公司 Double-mode deep semi-supervised emotion classification method based on expressions and voices
CN114626430B (en) * 2021-12-30 2022-10-18 华院计算技术(上海)股份有限公司 Emotion recognition model training method, emotion recognition device and emotion recognition medium
CN114626430A (en) * 2021-12-30 2022-06-14 华院计算技术(上海)股份有限公司 Emotion recognition model training method, emotion recognition device and emotion recognition medium
CN115240649A (en) * 2022-07-19 2022-10-25 于振华 Voice recognition method and system based on deep learning
CN116561533A (en) * 2023-07-05 2023-08-08 福建天晴数码有限公司 Emotion evolution method and terminal for virtual avatar in educational element universe
CN116561533B (en) * 2023-07-05 2023-09-29 福建天晴数码有限公司 Emotion evolution method and terminal for virtual avatar in educational element universe

Also Published As

Publication number Publication date
CN110516696B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN110516696A (en) It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression
CN110556129B (en) Bimodal emotion recognition model training method and bimodal emotion recognition method
CN106250855A (en) A kind of multi-modal emotion identification method based on Multiple Kernel Learning
Busso et al. Iterative feature normalization scheme for automatic emotion detection from speech
Bhat et al. Automatic assessment of sentence-level dysarthria intelligibility using BLSTM
CN103366618B (en) Scene device for Chinese learning training based on artificial intelligence and virtual reality
Chao et al. Multi task sequence learning for depression scale prediction from video
He et al. Multimodal depression recognition with dynamic visual and audio cues
Tian et al. Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features
CN108899049A (en) A kind of speech-emotion recognition method and system based on convolutional neural networks
CN110147548A (en) The emotion identification method initialized based on bidirectional valve controlled cycling element network and new network
Noroozi et al. Supervised vocal-based emotion recognition using multiclass support vector machine, random forests, and adaboost
CN110534133A (en) A kind of speech emotion recognition system and speech-emotion recognition method
Elshaer et al. Transfer learning from sound representations for anger detection in speech
CN110289002A (en) A kind of speaker clustering method and system end to end
CN109065073A (en) Speech-emotion recognition method based on depth S VM network model
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
Huang et al. Speech emotion recognition using convolutional neural network with audio word-based embedding
CN114898779A (en) Multi-mode fused speech emotion recognition method and system
Jiang et al. Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit.
Rajarajeswari et al. An executable method for an intelligent speech and call recognition system using a machine learning-based approach
CN111090726A (en) NLP-based electric power industry character customer service interaction method
Ling An acoustic model for English speech recognition based on deep learning
CN116434786A (en) Text-semantic-assisted teacher voice emotion recognition method
Lan et al. Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant