CN116524960A - Speech emotion recognition system based on mixed entropy downsampling and integrated classifier - Google Patents

Speech emotion recognition system based on mixed entropy downsampling and integrated classifier Download PDF

Info

Publication number
CN116524960A
CN116524960A CN202310509029.1A CN202310509029A CN116524960A CN 116524960 A CN116524960 A CN 116524960A CN 202310509029 A CN202310509029 A CN 202310509029A CN 116524960 A CN116524960 A CN 116524960A
Authority
CN
China
Prior art keywords
voice
entropy
emotion
speech
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310509029.1A
Other languages
Chinese (zh)
Inventor
李冬冬
王喆
宣正吉
王建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202310509029.1A priority Critical patent/CN116524960A/en
Publication of CN116524960A publication Critical patent/CN116524960A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Abstract

The invention discloses a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier, which comprises the following steps: the preprocessing stage divides the voice signal of the training data into fragments, then extracts a spectrogram, trains a base classifier by using the fragment, and obtains the depth characteristic and the confidence coefficient of each voice fragment; the training stage, calculating the mixed entropy of all the voice fragments and taking the weighted sum of the mixed entropy and the confidence coefficient as a ranking value; then, the spectrograms of the voice fragments with the ranking values larger than the set threshold value are used for retraining a base classifier, ranking values of all the voice fragments and training the base classifier are calculated again, the operation is cycled for given rounds, and the base classifier trained in each round forms an integrated classifier; finally, testing the speech segments, extracting the spectrogram, inputting the speech segments into an integrated classifier, and calculating the emotion recognition result of the speech. The invention obviously reduces the influence of voice fragments with undefined emotion and unstable distribution structure, and effectively improves the accuracy of voice emotion recognition.

Description

Speech emotion recognition system based on mixed entropy downsampling and integrated classifier
Technical Field
The invention relates to the technical field of voice emotion recognition, in particular to a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier.
Background
Speech is the most direct and natural communication mode between people, and is the main form of man-machine interaction. However, speech emotion in real life is often complex, subtle, in a constantly changing state. Thus, detecting and recognizing emotion in speech has become a challenging task. In recent years, speech emotion recognition has been studied and developed, and has been widely used in various fields such as virtual customer service, intelligent assistants, and medical auxiliary diagnosis. The speech emotion recognition system generally comprises two parts of feature extraction and training of a classifier, wherein the traditional method is to segment an original speech waveform and then extract artificial features, and the classifier commonly used in speech emotion recognition comprises a Gaussian mixture classifier, a support vector machine and the like. In recent years, with the development of deep learning, many methods based on deep learning classifiers, such as recurrent neural network classifiers and convolutional neural network classifiers, have emerged.
Previous studies have found that the confidence level of each emotion varies with the location of each segment of speech in the speech. For example, a true emotion tag for a piece of speech is happy, but the trained classifier results show that the confidence of neutral emotion is highest in the first half of speech and the confidence of happy is highest in the second half. Clearly, the first half has weaker emotional intensity of happiness, which is detrimental to classifier training. Speech segments with ambiguous emotion introduce noise into the classifier training process and degrade the performance of the speech emotion recognition system. Thus, challenges remain in speech emotion recognition at the speech segment level. While there are some approaches to this problem, such as attention mechanisms and multi-instance learning, these approaches autonomously learn how to weight different parts of speech through deep learning classifiers, which are difficult to analyze and interpret theoretically.
Disclosure of Invention
The invention provides a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier, which selects voice fragments with clear emotion from voice fragments of all training data for next round of training in each round of training, namely downsampling the voice fragments of all training data, each round can generate a base classifier, and the base classifiers form the integrated classifier. In the training process of each round, the mixed entropy and the confidence of the voice fragments are calculated, and the ranking value is calculated according to the mixed entropy and the confidence, so that a sample with clear emotion type is selected. The integrated classifier utilizes the base classifier trained by multiple iterations to predict emotion of the whole voice, and accuracy of voice emotion recognition is effectively improved.
The voice emotion recognition system based on the mixed entropy downsampling and the integrated classifier in the project comprises the following steps:
1) Dividing a data set into two parts of training data and test data, dividing a voice signal of the training data into fragments, extracting a spectrogram, and training a base classifier by using the voice signal and obtaining depth characteristics and confidence of each voice fragment;
2) Calculating the mixed entropy of all the voice fragments and taking the weighted sum of the mixed entropy and the confidence coefficient as a ranking value;
3) The spectrogram of the voice fragments with the ranking values larger than the set threshold value is used for retraining a base classifier, ranking values of all the voice fragments and training the base classifier are calculated again, the operation is cycled for given rounds, and the base classifier trained in each round forms an integrated classifier;
4) Testing the speech segments, extracting the spectrogram, inputting the speech segments into an integrated classifier, and calculating the emotion recognition result of the speech.
The technical scheme adopted by the invention can be further refined, the label of each voice segment is the real label of the whole voice in which the label is positioned in the data set, and the mixed entropy of the voice segment in the step 2) consists of emotion certainty entropy and structure distribution entropy, wherein the emotion certainty entropy is used for measuring the significance degree of emotion expressed by the voice segment, and the emotion certainty entropy is used for measuring the significance degree of emotion expressed by the voice segmentThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, C is the emotion type number in the data set, k is the set neighbor number,determining degree entropy for the basis of the ith voice segment;
the structure distribution entropy is used for measuring the distribution structure stability of the voice fragments in the depth feature spaceThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, k is the set neighbor number, d i,q Representing the Euclidean distance between depth features of the ith speech segment and the qth speech segment in the training data, ln represents the logarithm of the calculated base e.
In the training process of the base model, a ranking value calculated by the weighted sum of the mixed entropy and the confidence coefficient is used as a basis for downsampling the voice fragments in each turn, the mixed entropy of each voice fragment is calculated by the emotion certainty entropy and the structure distribution entropy, and the formula of the mixed entropy is as follows:
wherein i is the number of the voice segment on the training data,representing emotion certainty entropy->Representing the structure distribution entropy, nor is a Min-Max normalization function, MIE i Representing the mixed entropy of the ith speech segment;
the ranking value of each speech segment is defined as the weighted sum of the mixed entropy and the confidence obtained in step 1), and the formula of the ranking value is:
Rnak i =(1-λ)nor(conf i )+λnor(-MIE i ), (4)
wherein i is the number of the voice fragment on the training data, conf i Representing confidence level of ith speech segment MiE i Represents the mixed entropy of the ith speech segment, lambda is the weight coefficient, nor is the Min-Max normalization function, rank i Representing the ranking value of the ith speech segment.
The basic model in each turn updates parameters by minimizing cross entropy loss of voice segment labels and voice segment emotion classification results through a gradient descent method, and finally an integrated classifier composed of basic classifiers generated by each turn calculates emotion types predicted by a system according to the output of each voice segment in the whole test voice.
The beneficial effects of the invention are as follows: the invention provides a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier, which combines the iterative classifiers of each round into an integrated classifier by selecting voice fragments participating in training in the basic classifier training of a plurality of rounds, thereby effectively improving the accuracy of voice emotion recognition and obviously reducing the influence of voice fragments with undefined emotion types compared with the existing classifier and basic classifier. The invention provides a concept of mixed entropy, wherein the mixed entropy of a voice segment comprises emotion certainty entropy and structure distribution entropy, and a ranking value calculated by the mixed entropy and the confidence is used as a standard, so that samples with definite emotion types and stable distribution structure can be effectively selected for training an integrated classifier.
Drawings
FIG. 1 is a block diagram of a speech emotion recognition system based on mixed entropy downsampling and an integrated classifier in accordance with the present invention.
Detailed Description
The following detailed description of specific embodiments of the invention refers to the accompanying drawings, in which:
step 1: pretreatment stepThe segment divides all emotion voice original voice signals in the training data into voice segments with the duration of 2s one by one, N voice segments are divided in total, no overlapping part exists between the two voice segments, the voice segments with the duration of less than 2s are subjected to zero padding processing on the read signal value, and then frames and window division operation are carried out on the signal value of each voice segment as required to extract a spectrogram as new training data Wherein f is the number of sub-frames, w is the characteristic length of the frame voice, and the corresponding training label of each voice segment is +.> The true emotion label of the whole voice in which the true emotion label is positioned in the training data;
step 2: in each iteration round i, a new base classifier m is trained l Training data composed of a spectrogram of a voice fragment in the first round is inputIts corresponding tag in Y is +.>Wherein n is the number of speech segments in each round that participate in the training of the base classifier; when l=1, X 1 The number n=n of the voice fragments, namely the spectrograms of all the voice fragments participate in the training of the base classifier; each speech segment is in the base classifier m l The final output isWherein C is the number of emotion categories in the dataset, which represents the probability that the speech segment is predicted to be each emotion category on the base classifierBasis classifier m l Predictive tag of->The loss function on the ith voice segment in the training process is a true emotion label y i Cross entropy loss-y 'of sum base classifier output' i ·log(y i )-(1-y′ i )·log(1-y i ) The loss is minimized and the parameters of the base classifier are updated by a gradient descent method, and after the given times of gradient descent iteration, the trained base classifier m of the round can be obtained l
Step 3: inputting all training data, namely the spectrogram X of all voice fragments, into a trained classifier m l Of which each speech segment has a spectrogram x i Depth features of size z can be obtained in the penultimate fully connected layer of the classifierWhere i is the number of the speech segment, the depth feature corresponding to the spectrogram X of all the speech segments can be denoted as f= { F i I=1, 2, …, N }; the speech segment is in the base classifier m l Confidence in the way can be determined by->yy=y i Calculating;
step 4: calculating a k-nearest neighbor Euclidean distance matrix between depth features F of voice segment spectrograms in all training data And a k nearest neighbor speech fragment numbering matrix +.>The method is used for calculating emotion certainty entropy and structure distribution entropy in the mixed entropy;
step 5: the mixed entropy is calculated on the depth features F of the speech segment spectrograms in all training data:
step 5.1: calculating emotion certainty entropy:
emotion certainty entropyThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, C is the emotion type number in the data set, k is the set neighbor number,entropy of the basic certainty of the ith speech segment, ln represents the logarithm based on e calculated as follows:
specifically, emotion certainty entropyIn the formula of->The number of the fragments corresponding to the emotion type label with the largest number of fragments among the k voice fragments with the nearest Euclidean distance calculated with the ith voice fragment on the depth characteristic of the training data is represented as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,represented in training dataCalculating the number of fragments with emotion type label j in k voice fragments with nearest Euclidean distance between depth features from ith voice fragment, which is formed by matrix M ind Calculating, < +.for the ith speech segment>Is M ind Voice fragment real tag y corresponding to ith row k neighbor voice fragment number i The number of fragments j.
Step 5.2: calculating structure distribution entropy:
wherein d is i,q ∈M dis Depth feature F representing the ith speech segment on depth feature F i Depth feature f with the q-th speech segment q Is a euclidean distance of (c).
Step 5.3: the mixed entropy of each speech segment is calculated from the emotion certainty entropy in step 5.1 and the structure distribution entropy in step 5.2:
where nor is the Min-Max normalization function.
Step 6: calculating ranking values on depth features F of speech segment spectrograms in all training data, wherein the ranking values are a weighted sum of the mixed entropy calculated in the step 5 and the confidence obtained in the step 3:
Rank i =(1-λ)nor(conf i )+λnor(-MIE i ) (10)
step 7: downsampling the speech fragments participating in training, and connecting Rank i A spectrogram of n voice fragments larger than a specified threshold is used as new training data X l+1 Namely, selecting a spectrogram of a speech fragment with clear emotion and strong distributed structural stability in a depth feature space as new training data X l+1
Step 8: repeating the steps 2 to 7 for L rounds, wherein the base classifier m is obtained in each round l Adding the integrated classifier into a set M to serve as an integrated classifier;
step 9: during the test, a complete voice is divided into E voice fragments, and the output of each voice fragment on the basic classifier obtained by the first round of training is thatWhere e is the segment number and C is the emotion category number in the data set, then the output of the complete speech on the integrated classifier M can be defined as:
wherein e is the segment number, and the output of each voice segment on the base classifier obtained by the first round of training isE is the number of voice fragments divided by the complete voice, L is the set total training round, and the complete voice is subjected to final recognition emotion subscript R of a voice emotion recognition system based on mixed entropy downsampling and integrated classifier ind The calculation formula of (2) is as follows:
wherein C is the emotion category number in the data set, R ind The corresponding emotion type is the final recognition result of the system.
Design of experiment
And (3) selecting an experimental data set: the invention uses a voice data set: IEMOCAP. It contains 12 hours of voice audio, played in conversational form by 10 actors. Five sections are divided into two, 10. In the experiment of the invention, only four common emotions of anger, happiness, neutrality and sadness are considered, and the emotion type real tag of the excited voice audio in the data set is also regarded as happiness. The data contains 5531 speech in total, including 1,103 categories of happiness, 1,636 categories of happiness, 1,708 categories of anger, and 1,084 categories of sadness.
We use two indices Weighted Accuracy (WA) and Unweighted Accuracy (UA) to measure the accuracy of the classifier on the test data, defined as the following two formulas, where N c Representing the number of class c emotion samples, r c Representing the number of emotionally correctly classified samples of class c:
the base classifier in the experiment adopts a ResNet18 convolutional neural network, an ablation experiment and a comparison experiment are respectively carried out on the basis of the classifier, the voice corresponding to each person in the data set is adopted as test data in the experiment in turn, and the average value is obtained on the result. The ablation experiment compares an original base classifier, the integrated learning of the original base classifier for downsampling by using the confidence level, the integrated learning classifier of the original base classifier for downsampling by using the mixed entropy, and the method provided by the invention to reveal the utility of each right in the method; in the comparison test, compared with a voice emotion recognition method which is popular in recent years, the method has the advantages that the number of adjacent times k=5, the total iteration times L=5, the weight coefficient lambda=0.6, the downsampling threshold t=0.6, 8 gradient descent iterations are carried out during training of the base classifier, the voice framing duration is 16ms, and the overlapping part duration is 8ms.
Ablation experimental results:
table 1 speech emotion recognition accuracy for ablation experiments on IEMOCAP datasets
Classifier name WA(%) UA(%)
Base classifier 54.95 56.42
Base classifier+confidence 57.26 58.49
Base classifier+mixed entropy 57.76 58.32
The invention is that 58.72 58.79
Each row in the table is a set of ablation experiments, and each column is the WA and UA of the current ablation experiment, respectively, as a percentage.
It can be seen that when only the confidence is used for downsampling ensemble learning, WA and UA of the classifier are improved by 2.31% and 2.07% respectively, which indicates that the confidence can well measure the emotion intensity on each segment. When downsampling is performed using only the mixed entropy as a basis, the result is more improved relative to the base classifier by 2.81% and 1.90% over WA and UA, respectively. The result is improved most when the confidence and the mixed entropy participate in downsampling at the same time, and the calculated ranking value considers the characteristics of the depth features on the sample space by adding the mixed entropy, so that the result accuracy is further improved.
The ablation experiment result shows that when the downsampling ensemble learning is carried out by using the confidence coefficient only, WA and UA of the classifier are respectively improved by 2.31% and 2.07%, and the confidence coefficient is proved to be capable of measuring the emotion intensity on each segment well. When downsampling is performed using only the mixed entropy as a basis, the result is more improved relative to the base classifier by 2.81% and 1.90% over WA and UA, respectively. The result is improved most when the confidence and the mixed entropy participate in downsampling at the same time, and the calculated ranking value considers the characteristics of the depth features on the sample space by adding the mixed entropy, so that the result accuracy is further improved.
Comparing the experimental results:
TABLE 2 accuracy of speech emotion recognition for comparative experiments on IEMOCAP datasets
Each row in the table is an experiment of a classifier, each column is WA and UA of the current experiment, respectively, listed in percent.
The comparison experiment shows that the accuracy of our classifier on WA and UA is high. The methods in the classifier 2 and the classifier 4 are deep learning classifiers, and the comparison result shows that the integrated learning method provided by the user effectively improves the accuracy of voice emotion recognition.

Claims (6)

1. A speech emotion recognition system based on mixed entropy downsampling and an integrated classifier, comprising the steps of:
1) Dividing a data set into two parts of training data and test data, dividing a voice signal of the training data into fragments, extracting a spectrogram, and training a base classifier by using the voice signal and obtaining depth characteristics and confidence of each voice fragment;
2) Calculating the mixed entropy of all the voice fragments and taking the weighted sum of the mixed entropy and the confidence coefficient as a ranking value;
3) The spectrogram of the voice fragments with the ranking values larger than the set threshold value is used for retraining a base classifier, ranking values of all the voice fragments and training the base classifier are calculated again, the operation is cycled for given rounds, and the base classifier trained in each round forms an integrated classifier;
4) Testing the speech segments, extracting the spectrogram, inputting the speech segments into an integrated classifier, and calculating the emotion recognition result of the speech.
2. The speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, wherein the mixed entropy in step 2) is composed of emotion certainty entropy and structure distribution entropy, wherein the emotion certainty entropy is used for measuring the significance of emotion expressed by the speech fragment;
emotion certainty entropyThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, C is the emotion type number in the data set, k is the set neighbor number,entropy of the basic certainty of the ith speech segment, ln represents the logarithm based on e calculated as follows:
specifically, emotion certainty entropyIn the formula of->The method is characterized in that the method comprises the steps of representing the number of fragments corresponding to emotion type labels with the largest number of fragments in k voice fragments with the nearest Euclidean distance calculated with the ith voice fragment on the depth characteristic of training data, wherein the formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the number of fragments with emotion type label j in k voice fragments with nearest Euclidean distance between depth features calculated from the ith voice fragment in the training data.
3. The hybrid entropy according to claim 2, wherein the structure distribution entropy in the hybrid entropy is used to measure the distribution structure stability of the speech segment in the depth feature space;
structure distribution entropyThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, k is the set neighbor number, d i,q Representing the Euclidean distance between depth features of the ith speech segment and the qth speech segment in the training data, ln represents the logarithm of the calculated base e.
4. A speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, wherein the mixed entropy of each speech segment in step 2) is calculated from the emotion certainty entropy in claim 2 and the structure distribution entropy in claim 3;
the formula of the mixed entropy is:
wherein i is the number of the voice segment on the training data,representing emotion certainty entropy->Representing the structure distribution entropy, nor is a Min-Max normalization function, MIE i Representing the mixed entropy of the ith speech segment.
5. A speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, characterized in that the ranking value of each speech segment in step 2) is defined as a weighted sum of the mixed entropy as claimed in claim 4 and the confidence obtained in step 1) in claim 1;
the formula of the ranking value is:
Rank i =(1-λ)nor(conf i )+λnor(-MTE i )
wherein i is the number of the voice fragment on the training data, conf i Representing confidence of ith speech segment, MIE i Represents the mixed entropy of the ith speech segment, lambda is the weight coefficient, nor is the Min-Max normalization function, rank i And representing the ranking value of the ith voice segment, wherein the ranking value is used as a basis for downsampling the voice segment.
6. The speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, wherein the integrated classifier obtained in step 3) is m= { M l L=1, 2,3, …, L }, where m l Representing the first roundThe trained basic classifier, L is the set total training round, a complete voice is divided into E voice fragments in the test process, and the output of each voice fragment on the basic classifier obtained by the first round training isWhere e is the segment number and C is the emotion category number in the data set, then the output of the complete speech on the integrated classifier M can be defined as:
wherein e is the segment number, and the output of each voice segment on the base classifier obtained by the first round of training isE is the number of voice fragments divided by the complete voice, L is the set total training round, and the final recognition emotion index R of the complete voice in the voice emotion recognition system based on mixed entropy downsampling and integrated classifier as set forth in claim 1 ind The calculation formula of (2) is as follows:
wherein C is the emotion category number in the data set, R ind The corresponding emotion type is the final recognition result of the system.
CN202310509029.1A 2023-05-08 2023-05-08 Speech emotion recognition system based on mixed entropy downsampling and integrated classifier Pending CN116524960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310509029.1A CN116524960A (en) 2023-05-08 2023-05-08 Speech emotion recognition system based on mixed entropy downsampling and integrated classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310509029.1A CN116524960A (en) 2023-05-08 2023-05-08 Speech emotion recognition system based on mixed entropy downsampling and integrated classifier

Publications (1)

Publication Number Publication Date
CN116524960A true CN116524960A (en) 2023-08-01

Family

ID=87390004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310509029.1A Pending CN116524960A (en) 2023-05-08 2023-05-08 Speech emotion recognition system based on mixed entropy downsampling and integrated classifier

Country Status (1)

Country Link
CN (1) CN116524960A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149944A (en) * 2023-08-07 2023-12-01 北京理工大学珠海学院 Multi-mode situation emotion recognition method and system based on wide time range
CN117496309A (en) * 2024-01-03 2024-02-02 华中科技大学 Building scene point cloud segmentation uncertainty evaluation method and system and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149944A (en) * 2023-08-07 2023-12-01 北京理工大学珠海学院 Multi-mode situation emotion recognition method and system based on wide time range
CN117149944B (en) * 2023-08-07 2024-04-23 北京理工大学珠海学院 Multi-mode situation emotion recognition method and system based on wide time range
CN117496309A (en) * 2024-01-03 2024-02-02 华中科技大学 Building scene point cloud segmentation uncertainty evaluation method and system and electronic equipment
CN117496309B (en) * 2024-01-03 2024-03-26 华中科技大学 Building scene point cloud segmentation uncertainty evaluation method and system and electronic equipment

Similar Documents

Publication Publication Date Title
CN108597541B (en) Speech emotion recognition method and system for enhancing anger and happiness recognition
CN110400579B (en) Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network
Bhatti et al. A neural network approach for human emotion recognition in speech
CN109241255A (en) A kind of intension recognizing method based on deep learning
CN116524960A (en) Speech emotion recognition system based on mixed entropy downsampling and integrated classifier
CN112232087B (en) Specific aspect emotion analysis method of multi-granularity attention model based on Transformer
CN110349597A (en) A kind of speech detection method and device
Huang et al. Large-scale weakly-supervised content embeddings for music recommendation and tagging
CN111899766B (en) Speech emotion recognition method based on optimization fusion of depth features and acoustic features
Bluche et al. Predicting detection filters for small footprint open-vocabulary keyword spotting
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN113505225A (en) Small sample medical relation classification method based on multilayer attention mechanism
Fornaciari et al. BERTective: Language models and contextual information for deception detection
Fan et al. Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition.
CN112711944B (en) Word segmentation method and system, and word segmentation device generation method and system
CN105006231A (en) Distributed large population speaker recognition method based on fuzzy clustering decision tree
CN116645980A (en) Full life cycle voice emotion recognition method for focusing sample feature spacing
Liu et al. Hierarchical component-attention based speaker turn embedding for emotion recognition
CN115512721A (en) PDAN-based cross-database speech emotion recognition method and device
CN112465054B (en) FCN-based multivariate time series data classification method
CN112699831B (en) Video hotspot segment detection method and device based on barrage emotion and storage medium
CN114927144A (en) Voice emotion recognition method based on attention mechanism and multi-task learning
CN114898776A (en) Voice emotion recognition method of multi-scale feature combined multi-task CNN decision tree
Reshma et al. A survey on speech emotion recognition
CN114742073A (en) Conversation emotion automatic identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination