CN116524960A - Speech emotion recognition system based on mixed entropy downsampling and integrated classifier - Google Patents
Speech emotion recognition system based on mixed entropy downsampling and integrated classifier Download PDFInfo
- Publication number
- CN116524960A CN116524960A CN202310509029.1A CN202310509029A CN116524960A CN 116524960 A CN116524960 A CN 116524960A CN 202310509029 A CN202310509029 A CN 202310509029A CN 116524960 A CN116524960 A CN 116524960A
- Authority
- CN
- China
- Prior art keywords
- voice
- entropy
- emotion
- speech
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 31
- 239000012634 fragment Substances 0.000 claims abstract description 65
- 230000008451 emotion Effects 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 61
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 239000004973 liquid crystal related substance Substances 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 2
- 238000007781 pre-processing Methods 0.000 abstract 1
- 238000002474 experimental method Methods 0.000 description 14
- 238000002679 ablation Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 206010002953 Aphonia Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013400 design of experiment Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Abstract
The invention discloses a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier, which comprises the following steps: the preprocessing stage divides the voice signal of the training data into fragments, then extracts a spectrogram, trains a base classifier by using the fragment, and obtains the depth characteristic and the confidence coefficient of each voice fragment; the training stage, calculating the mixed entropy of all the voice fragments and taking the weighted sum of the mixed entropy and the confidence coefficient as a ranking value; then, the spectrograms of the voice fragments with the ranking values larger than the set threshold value are used for retraining a base classifier, ranking values of all the voice fragments and training the base classifier are calculated again, the operation is cycled for given rounds, and the base classifier trained in each round forms an integrated classifier; finally, testing the speech segments, extracting the spectrogram, inputting the speech segments into an integrated classifier, and calculating the emotion recognition result of the speech. The invention obviously reduces the influence of voice fragments with undefined emotion and unstable distribution structure, and effectively improves the accuracy of voice emotion recognition.
Description
Technical Field
The invention relates to the technical field of voice emotion recognition, in particular to a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier.
Background
Speech is the most direct and natural communication mode between people, and is the main form of man-machine interaction. However, speech emotion in real life is often complex, subtle, in a constantly changing state. Thus, detecting and recognizing emotion in speech has become a challenging task. In recent years, speech emotion recognition has been studied and developed, and has been widely used in various fields such as virtual customer service, intelligent assistants, and medical auxiliary diagnosis. The speech emotion recognition system generally comprises two parts of feature extraction and training of a classifier, wherein the traditional method is to segment an original speech waveform and then extract artificial features, and the classifier commonly used in speech emotion recognition comprises a Gaussian mixture classifier, a support vector machine and the like. In recent years, with the development of deep learning, many methods based on deep learning classifiers, such as recurrent neural network classifiers and convolutional neural network classifiers, have emerged.
Previous studies have found that the confidence level of each emotion varies with the location of each segment of speech in the speech. For example, a true emotion tag for a piece of speech is happy, but the trained classifier results show that the confidence of neutral emotion is highest in the first half of speech and the confidence of happy is highest in the second half. Clearly, the first half has weaker emotional intensity of happiness, which is detrimental to classifier training. Speech segments with ambiguous emotion introduce noise into the classifier training process and degrade the performance of the speech emotion recognition system. Thus, challenges remain in speech emotion recognition at the speech segment level. While there are some approaches to this problem, such as attention mechanisms and multi-instance learning, these approaches autonomously learn how to weight different parts of speech through deep learning classifiers, which are difficult to analyze and interpret theoretically.
Disclosure of Invention
The invention provides a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier, which selects voice fragments with clear emotion from voice fragments of all training data for next round of training in each round of training, namely downsampling the voice fragments of all training data, each round can generate a base classifier, and the base classifiers form the integrated classifier. In the training process of each round, the mixed entropy and the confidence of the voice fragments are calculated, and the ranking value is calculated according to the mixed entropy and the confidence, so that a sample with clear emotion type is selected. The integrated classifier utilizes the base classifier trained by multiple iterations to predict emotion of the whole voice, and accuracy of voice emotion recognition is effectively improved.
The voice emotion recognition system based on the mixed entropy downsampling and the integrated classifier in the project comprises the following steps:
1) Dividing a data set into two parts of training data and test data, dividing a voice signal of the training data into fragments, extracting a spectrogram, and training a base classifier by using the voice signal and obtaining depth characteristics and confidence of each voice fragment;
2) Calculating the mixed entropy of all the voice fragments and taking the weighted sum of the mixed entropy and the confidence coefficient as a ranking value;
3) The spectrogram of the voice fragments with the ranking values larger than the set threshold value is used for retraining a base classifier, ranking values of all the voice fragments and training the base classifier are calculated again, the operation is cycled for given rounds, and the base classifier trained in each round forms an integrated classifier;
4) Testing the speech segments, extracting the spectrogram, inputting the speech segments into an integrated classifier, and calculating the emotion recognition result of the speech.
The technical scheme adopted by the invention can be further refined, the label of each voice segment is the real label of the whole voice in which the label is positioned in the data set, and the mixed entropy of the voice segment in the step 2) consists of emotion certainty entropy and structure distribution entropy, wherein the emotion certainty entropy is used for measuring the significance degree of emotion expressed by the voice segment, and the emotion certainty entropy is used for measuring the significance degree of emotion expressed by the voice segmentThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, C is the emotion type number in the data set, k is the set neighbor number,determining degree entropy for the basis of the ith voice segment;
the structure distribution entropy is used for measuring the distribution structure stability of the voice fragments in the depth feature spaceThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, k is the set neighbor number, d i,q Representing the Euclidean distance between depth features of the ith speech segment and the qth speech segment in the training data, ln represents the logarithm of the calculated base e.
In the training process of the base model, a ranking value calculated by the weighted sum of the mixed entropy and the confidence coefficient is used as a basis for downsampling the voice fragments in each turn, the mixed entropy of each voice fragment is calculated by the emotion certainty entropy and the structure distribution entropy, and the formula of the mixed entropy is as follows:
wherein i is the number of the voice segment on the training data,representing emotion certainty entropy->Representing the structure distribution entropy, nor is a Min-Max normalization function, MIE i Representing the mixed entropy of the ith speech segment;
the ranking value of each speech segment is defined as the weighted sum of the mixed entropy and the confidence obtained in step 1), and the formula of the ranking value is:
Rnak i =(1-λ)nor(conf i )+λnor(-MIE i ), (4)
wherein i is the number of the voice fragment on the training data, conf i Representing confidence level of ith speech segment MiE i Represents the mixed entropy of the ith speech segment, lambda is the weight coefficient, nor is the Min-Max normalization function, rank i Representing the ranking value of the ith speech segment.
The basic model in each turn updates parameters by minimizing cross entropy loss of voice segment labels and voice segment emotion classification results through a gradient descent method, and finally an integrated classifier composed of basic classifiers generated by each turn calculates emotion types predicted by a system according to the output of each voice segment in the whole test voice.
The beneficial effects of the invention are as follows: the invention provides a voice emotion recognition system based on mixed entropy downsampling and an integrated classifier, which combines the iterative classifiers of each round into an integrated classifier by selecting voice fragments participating in training in the basic classifier training of a plurality of rounds, thereby effectively improving the accuracy of voice emotion recognition and obviously reducing the influence of voice fragments with undefined emotion types compared with the existing classifier and basic classifier. The invention provides a concept of mixed entropy, wherein the mixed entropy of a voice segment comprises emotion certainty entropy and structure distribution entropy, and a ranking value calculated by the mixed entropy and the confidence is used as a standard, so that samples with definite emotion types and stable distribution structure can be effectively selected for training an integrated classifier.
Drawings
FIG. 1 is a block diagram of a speech emotion recognition system based on mixed entropy downsampling and an integrated classifier in accordance with the present invention.
Detailed Description
The following detailed description of specific embodiments of the invention refers to the accompanying drawings, in which:
step 1: pretreatment stepThe segment divides all emotion voice original voice signals in the training data into voice segments with the duration of 2s one by one, N voice segments are divided in total, no overlapping part exists between the two voice segments, the voice segments with the duration of less than 2s are subjected to zero padding processing on the read signal value, and then frames and window division operation are carried out on the signal value of each voice segment as required to extract a spectrogram as new training data Wherein f is the number of sub-frames, w is the characteristic length of the frame voice, and the corresponding training label of each voice segment is +.> The true emotion label of the whole voice in which the true emotion label is positioned in the training data;
step 2: in each iteration round i, a new base classifier m is trained l Training data composed of a spectrogram of a voice fragment in the first round is inputIts corresponding tag in Y is +.>Wherein n is the number of speech segments in each round that participate in the training of the base classifier; when l=1, X 1 The number n=n of the voice fragments, namely the spectrograms of all the voice fragments participate in the training of the base classifier; each speech segment is in the base classifier m l The final output isWherein C is the number of emotion categories in the dataset, which represents the probability that the speech segment is predicted to be each emotion category on the base classifierBasis classifier m l Predictive tag of->The loss function on the ith voice segment in the training process is a true emotion label y i Cross entropy loss-y 'of sum base classifier output' i ·log(y i )-(1-y′ i )·log(1-y i ) The loss is minimized and the parameters of the base classifier are updated by a gradient descent method, and after the given times of gradient descent iteration, the trained base classifier m of the round can be obtained l ;
Step 3: inputting all training data, namely the spectrogram X of all voice fragments, into a trained classifier m l Of which each speech segment has a spectrogram x i Depth features of size z can be obtained in the penultimate fully connected layer of the classifierWhere i is the number of the speech segment, the depth feature corresponding to the spectrogram X of all the speech segments can be denoted as f= { F i I=1, 2, …, N }; the speech segment is in the base classifier m l Confidence in the way can be determined by->yy=y i Calculating;
step 4: calculating a k-nearest neighbor Euclidean distance matrix between depth features F of voice segment spectrograms in all training data And a k nearest neighbor speech fragment numbering matrix +.>The method is used for calculating emotion certainty entropy and structure distribution entropy in the mixed entropy;
step 5: the mixed entropy is calculated on the depth features F of the speech segment spectrograms in all training data:
step 5.1: calculating emotion certainty entropy:
emotion certainty entropyThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, C is the emotion type number in the data set, k is the set neighbor number,entropy of the basic certainty of the ith speech segment, ln represents the logarithm based on e calculated as follows:
specifically, emotion certainty entropyIn the formula of->The number of the fragments corresponding to the emotion type label with the largest number of fragments among the k voice fragments with the nearest Euclidean distance calculated with the ith voice fragment on the depth characteristic of the training data is represented as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,represented in training dataCalculating the number of fragments with emotion type label j in k voice fragments with nearest Euclidean distance between depth features from ith voice fragment, which is formed by matrix M ind Calculating, < +.for the ith speech segment>Is M ind Voice fragment real tag y corresponding to ith row k neighbor voice fragment number i The number of fragments j.
Step 5.2: calculating structure distribution entropy:
wherein d is i,q ∈M dis Depth feature F representing the ith speech segment on depth feature F i Depth feature f with the q-th speech segment q Is a euclidean distance of (c).
Step 5.3: the mixed entropy of each speech segment is calculated from the emotion certainty entropy in step 5.1 and the structure distribution entropy in step 5.2:
where nor is the Min-Max normalization function.
Step 6: calculating ranking values on depth features F of speech segment spectrograms in all training data, wherein the ranking values are a weighted sum of the mixed entropy calculated in the step 5 and the confidence obtained in the step 3:
Rank i =(1-λ)nor(conf i )+λnor(-MIE i ) (10)
step 7: downsampling the speech fragments participating in training, and connecting Rank i A spectrogram of n voice fragments larger than a specified threshold is used as new training data X l+1 Namely, selecting a spectrogram of a speech fragment with clear emotion and strong distributed structural stability in a depth feature space as new training data X l+1 ;
Step 8: repeating the steps 2 to 7 for L rounds, wherein the base classifier m is obtained in each round l Adding the integrated classifier into a set M to serve as an integrated classifier;
step 9: during the test, a complete voice is divided into E voice fragments, and the output of each voice fragment on the basic classifier obtained by the first round of training is thatWhere e is the segment number and C is the emotion category number in the data set, then the output of the complete speech on the integrated classifier M can be defined as:
wherein e is the segment number, and the output of each voice segment on the base classifier obtained by the first round of training isE is the number of voice fragments divided by the complete voice, L is the set total training round, and the complete voice is subjected to final recognition emotion subscript R of a voice emotion recognition system based on mixed entropy downsampling and integrated classifier ind The calculation formula of (2) is as follows:
wherein C is the emotion category number in the data set, R ind The corresponding emotion type is the final recognition result of the system.
Design of experiment
And (3) selecting an experimental data set: the invention uses a voice data set: IEMOCAP. It contains 12 hours of voice audio, played in conversational form by 10 actors. Five sections are divided into two, 10. In the experiment of the invention, only four common emotions of anger, happiness, neutrality and sadness are considered, and the emotion type real tag of the excited voice audio in the data set is also regarded as happiness. The data contains 5531 speech in total, including 1,103 categories of happiness, 1,636 categories of happiness, 1,708 categories of anger, and 1,084 categories of sadness.
We use two indices Weighted Accuracy (WA) and Unweighted Accuracy (UA) to measure the accuracy of the classifier on the test data, defined as the following two formulas, where N c Representing the number of class c emotion samples, r c Representing the number of emotionally correctly classified samples of class c:
the base classifier in the experiment adopts a ResNet18 convolutional neural network, an ablation experiment and a comparison experiment are respectively carried out on the basis of the classifier, the voice corresponding to each person in the data set is adopted as test data in the experiment in turn, and the average value is obtained on the result. The ablation experiment compares an original base classifier, the integrated learning of the original base classifier for downsampling by using the confidence level, the integrated learning classifier of the original base classifier for downsampling by using the mixed entropy, and the method provided by the invention to reveal the utility of each right in the method; in the comparison test, compared with a voice emotion recognition method which is popular in recent years, the method has the advantages that the number of adjacent times k=5, the total iteration times L=5, the weight coefficient lambda=0.6, the downsampling threshold t=0.6, 8 gradient descent iterations are carried out during training of the base classifier, the voice framing duration is 16ms, and the overlapping part duration is 8ms.
Ablation experimental results:
table 1 speech emotion recognition accuracy for ablation experiments on IEMOCAP datasets
Classifier name | WA(%) | UA(%) |
Base classifier | 54.95 | 56.42 |
Base classifier+confidence | 57.26 | 58.49 |
Base classifier+mixed entropy | 57.76 | 58.32 |
The invention is that | 58.72 | 58.79 |
Each row in the table is a set of ablation experiments, and each column is the WA and UA of the current ablation experiment, respectively, as a percentage.
It can be seen that when only the confidence is used for downsampling ensemble learning, WA and UA of the classifier are improved by 2.31% and 2.07% respectively, which indicates that the confidence can well measure the emotion intensity on each segment. When downsampling is performed using only the mixed entropy as a basis, the result is more improved relative to the base classifier by 2.81% and 1.90% over WA and UA, respectively. The result is improved most when the confidence and the mixed entropy participate in downsampling at the same time, and the calculated ranking value considers the characteristics of the depth features on the sample space by adding the mixed entropy, so that the result accuracy is further improved.
The ablation experiment result shows that when the downsampling ensemble learning is carried out by using the confidence coefficient only, WA and UA of the classifier are respectively improved by 2.31% and 2.07%, and the confidence coefficient is proved to be capable of measuring the emotion intensity on each segment well. When downsampling is performed using only the mixed entropy as a basis, the result is more improved relative to the base classifier by 2.81% and 1.90% over WA and UA, respectively. The result is improved most when the confidence and the mixed entropy participate in downsampling at the same time, and the calculated ranking value considers the characteristics of the depth features on the sample space by adding the mixed entropy, so that the result accuracy is further improved.
Comparing the experimental results:
TABLE 2 accuracy of speech emotion recognition for comparative experiments on IEMOCAP datasets
Each row in the table is an experiment of a classifier, each column is WA and UA of the current experiment, respectively, listed in percent.
The comparison experiment shows that the accuracy of our classifier on WA and UA is high. The methods in the classifier 2 and the classifier 4 are deep learning classifiers, and the comparison result shows that the integrated learning method provided by the user effectively improves the accuracy of voice emotion recognition.
Claims (6)
1. A speech emotion recognition system based on mixed entropy downsampling and an integrated classifier, comprising the steps of:
1) Dividing a data set into two parts of training data and test data, dividing a voice signal of the training data into fragments, extracting a spectrogram, and training a base classifier by using the voice signal and obtaining depth characteristics and confidence of each voice fragment;
2) Calculating the mixed entropy of all the voice fragments and taking the weighted sum of the mixed entropy and the confidence coefficient as a ranking value;
3) The spectrogram of the voice fragments with the ranking values larger than the set threshold value is used for retraining a base classifier, ranking values of all the voice fragments and training the base classifier are calculated again, the operation is cycled for given rounds, and the base classifier trained in each round forms an integrated classifier;
4) Testing the speech segments, extracting the spectrogram, inputting the speech segments into an integrated classifier, and calculating the emotion recognition result of the speech.
2. The speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, wherein the mixed entropy in step 2) is composed of emotion certainty entropy and structure distribution entropy, wherein the emotion certainty entropy is used for measuring the significance of emotion expressed by the speech fragment;
emotion certainty entropyThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, C is the emotion type number in the data set, k is the set neighbor number,entropy of the basic certainty of the ith speech segment, ln represents the logarithm based on e calculated as follows:
specifically, emotion certainty entropyIn the formula of->The method is characterized in that the method comprises the steps of representing the number of fragments corresponding to emotion type labels with the largest number of fragments in k voice fragments with the nearest Euclidean distance calculated with the ith voice fragment on the depth characteristic of training data, wherein the formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the number of fragments with emotion type label j in k voice fragments with nearest Euclidean distance between depth features calculated from the ith voice fragment in the training data.
3. The hybrid entropy according to claim 2, wherein the structure distribution entropy in the hybrid entropy is used to measure the distribution structure stability of the speech segment in the depth feature space;
structure distribution entropyThe formula of (2) is:
wherein i is the number of the voice fragment on the training data, k is the set neighbor number, d i,q Representing the Euclidean distance between depth features of the ith speech segment and the qth speech segment in the training data, ln represents the logarithm of the calculated base e.
4. A speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, wherein the mixed entropy of each speech segment in step 2) is calculated from the emotion certainty entropy in claim 2 and the structure distribution entropy in claim 3;
the formula of the mixed entropy is:
wherein i is the number of the voice segment on the training data,representing emotion certainty entropy->Representing the structure distribution entropy, nor is a Min-Max normalization function, MIE i Representing the mixed entropy of the ith speech segment.
5. A speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, characterized in that the ranking value of each speech segment in step 2) is defined as a weighted sum of the mixed entropy as claimed in claim 4 and the confidence obtained in step 1) in claim 1;
the formula of the ranking value is:
Rank i =(1-λ)nor(conf i )+λnor(-MTE i )
wherein i is the number of the voice fragment on the training data, conf i Representing confidence of ith speech segment, MIE i Represents the mixed entropy of the ith speech segment, lambda is the weight coefficient, nor is the Min-Max normalization function, rank i And representing the ranking value of the ith voice segment, wherein the ranking value is used as a basis for downsampling the voice segment.
6. The speech emotion recognition system based on mixed entropy downsampling and integrated classifier as claimed in claim 1, wherein the integrated classifier obtained in step 3) is m= { M l L=1, 2,3, …, L }, where m l Representing the first roundThe trained basic classifier, L is the set total training round, a complete voice is divided into E voice fragments in the test process, and the output of each voice fragment on the basic classifier obtained by the first round training isWhere e is the segment number and C is the emotion category number in the data set, then the output of the complete speech on the integrated classifier M can be defined as:
wherein e is the segment number, and the output of each voice segment on the base classifier obtained by the first round of training isE is the number of voice fragments divided by the complete voice, L is the set total training round, and the final recognition emotion index R of the complete voice in the voice emotion recognition system based on mixed entropy downsampling and integrated classifier as set forth in claim 1 ind The calculation formula of (2) is as follows:
wherein C is the emotion category number in the data set, R ind The corresponding emotion type is the final recognition result of the system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310509029.1A CN116524960A (en) | 2023-05-08 | 2023-05-08 | Speech emotion recognition system based on mixed entropy downsampling and integrated classifier |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310509029.1A CN116524960A (en) | 2023-05-08 | 2023-05-08 | Speech emotion recognition system based on mixed entropy downsampling and integrated classifier |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116524960A true CN116524960A (en) | 2023-08-01 |
Family
ID=87390004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310509029.1A Pending CN116524960A (en) | 2023-05-08 | 2023-05-08 | Speech emotion recognition system based on mixed entropy downsampling and integrated classifier |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524960A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117149944A (en) * | 2023-08-07 | 2023-12-01 | 北京理工大学珠海学院 | Multi-mode situation emotion recognition method and system based on wide time range |
CN117496309A (en) * | 2024-01-03 | 2024-02-02 | 华中科技大学 | Building scene point cloud segmentation uncertainty evaluation method and system and electronic equipment |
-
2023
- 2023-05-08 CN CN202310509029.1A patent/CN116524960A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117149944A (en) * | 2023-08-07 | 2023-12-01 | 北京理工大学珠海学院 | Multi-mode situation emotion recognition method and system based on wide time range |
CN117149944B (en) * | 2023-08-07 | 2024-04-23 | 北京理工大学珠海学院 | Multi-mode situation emotion recognition method and system based on wide time range |
CN117496309A (en) * | 2024-01-03 | 2024-02-02 | 华中科技大学 | Building scene point cloud segmentation uncertainty evaluation method and system and electronic equipment |
CN117496309B (en) * | 2024-01-03 | 2024-03-26 | 华中科技大学 | Building scene point cloud segmentation uncertainty evaluation method and system and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108597541B (en) | Speech emotion recognition method and system for enhancing anger and happiness recognition | |
CN110400579B (en) | Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network | |
Bhatti et al. | A neural network approach for human emotion recognition in speech | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN116524960A (en) | Speech emotion recognition system based on mixed entropy downsampling and integrated classifier | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
CN110349597A (en) | A kind of speech detection method and device | |
Huang et al. | Large-scale weakly-supervised content embeddings for music recommendation and tagging | |
CN111899766B (en) | Speech emotion recognition method based on optimization fusion of depth features and acoustic features | |
Bluche et al. | Predicting detection filters for small footprint open-vocabulary keyword spotting | |
CN110992988B (en) | Speech emotion recognition method and device based on domain confrontation | |
CN113505225A (en) | Small sample medical relation classification method based on multilayer attention mechanism | |
Fornaciari et al. | BERTective: Language models and contextual information for deception detection | |
Fan et al. | Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition. | |
CN112711944B (en) | Word segmentation method and system, and word segmentation device generation method and system | |
CN105006231A (en) | Distributed large population speaker recognition method based on fuzzy clustering decision tree | |
CN116645980A (en) | Full life cycle voice emotion recognition method for focusing sample feature spacing | |
Liu et al. | Hierarchical component-attention based speaker turn embedding for emotion recognition | |
CN115512721A (en) | PDAN-based cross-database speech emotion recognition method and device | |
CN112465054B (en) | FCN-based multivariate time series data classification method | |
CN112699831B (en) | Video hotspot segment detection method and device based on barrage emotion and storage medium | |
CN114927144A (en) | Voice emotion recognition method based on attention mechanism and multi-task learning | |
CN114898776A (en) | Voice emotion recognition method of multi-scale feature combined multi-task CNN decision tree | |
Reshma et al. | A survey on speech emotion recognition | |
CN114742073A (en) | Conversation emotion automatic identification method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |