CN113903362A - Speech emotion recognition method based on neural network - Google Patents

Speech emotion recognition method based on neural network Download PDF

Info

Publication number
CN113903362A
CN113903362A CN202110990439.3A CN202110990439A CN113903362A CN 113903362 A CN113903362 A CN 113903362A CN 202110990439 A CN202110990439 A CN 202110990439A CN 113903362 A CN113903362 A CN 113903362A
Authority
CN
China
Prior art keywords
emotion
neural network
text
speech
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110990439.3A
Other languages
Chinese (zh)
Other versions
CN113903362B (en
Inventor
张悦
黄逸轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110990439.3A priority Critical patent/CN113903362B/en
Publication of CN113903362A publication Critical patent/CN113903362A/en
Application granted granted Critical
Publication of CN113903362B publication Critical patent/CN113903362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a speech emotion recognition method based on a neural network, which comprises the steps of classifying a target speech signal into four types of emotions, namely happy emotion, sad emotion, neutral emotion and angry emotion, extracting the characteristics of the speech signal based on a filter bank, then respectively sending the characteristics into a convolutional neural network and a time delay neural network to automatically extract emotion characteristics, obtaining the probability value belonging to each type of emotion by using a normalized exponential function classifier, and selecting the emotion corresponding to the maximum probability value as the emotion category of the speech; and then the target voice signal is recognized as a text, the text emotion classification is obtained by sending the text to a pre-training model of a bidirectional encoder, and the final emotion classification is obtained by fusing the three models, so that the problems that model fusion and multi-mode emotion recognition training are difficult and the accuracy is not improved greatly in the prior art are solved.

Description

Speech emotion recognition method based on neural network
Technical Field
The invention relates to the technical field of speech emotion recognition, in particular to a speech emotion recognition method based on a neural network.
Background
Many methods for speech emotion recognition are to adopt different speech emotion classification models for fusion, however, since all speech information is used, the relevance of the models is relatively high, and the effect of model fusion is not greatly improved; there is also a method of extracting features using different models, and then different models are fused according to the same weight, which also has the problem of little effect improvement.
At present, a multi-mode method of text emotion recognition and voice emotion recognition is adopted, but feature fusion is adopted, and due to the fact that learning speeds of different models are different, the feature fusion cannot well play a role in complementing advantages of information in different modes.
Disclosure of Invention
The invention aims to provide a speech emotion recognition method based on a neural network, and aims to solve the problems that model fusion and multi-mode emotion recognition training are difficult and accuracy is not improved greatly in the prior art.
In order to achieve the above object, the present invention adopts a speech emotion recognition method based on a neural network, comprising the following steps:
extracting voice features and sending the voice features to a convolutional neural network to obtain convolutional emotion categories;
the voice features are sent to a time delay neural network to obtain time delay emotion types;
recognizing a voice text and sending the voice text into a pre-training model of a bidirectional encoder to obtain the emotion type of the text;
and model fusion to obtain the final emotion classification.
Wherein the speech feature is a filter bank based feature of the target speech signal.
The target speech signal is divided into four categories of happiness, sadness, neutrality and anger, and the convolution emotion category, the time delay emotion category, the text emotion category and the final emotion category are any one of the four categories.
In the process of extracting voice features and sending the voice features to a convolutional neural network to obtain convolutional emotion categories, the convolutional neural network automatically extracts emotion features contained in the voice features, a normalized exponential function classifier is used for obtaining probability values of the emotion features belonging to each category, and the emotion features corresponding to the maximum probability values are selected as convolutional emotion categories.
In the process of sending the voice features into the time delay neural network to obtain the time delay emotion categories, the time delay neural network automatically extracts the emotion features contained in the voice features, then uses a normalization index function classifier to obtain the probability value of the emotion features belonging to each category, and selects the emotion features corresponding to the maximum probability value as the time delay emotion categories.
The method comprises the following steps of recognizing a voice text and sending the voice text to a pre-training model of a bidirectional encoder to obtain the emotion type of the text, wherein the method comprises the following steps:
recognizing a text corresponding to the target voice signal by utilizing a voice recognition technology to obtain a voice text;
mapping the characters in the voice text into corresponding labels to form a label sequence;
sending the label sequence into a pre-training model of a bidirectional encoder, and extracting emotional characteristics contained in the text;
and obtaining the probability value of the emotional feature belonging to each type by using a normalized index function classifier, and selecting the emotional feature corresponding to the maximum probability value as the text emotional category.
In the process of obtaining the final emotion category through model fusion, the probability values of the convolution emotion category, the time delay emotion category and the text emotion category after respective normalization index functions are linearly added, and the emotion feature corresponding to the maximum value is selected as the final emotion category.
And in the process of carrying out the linear addition, the weight values of different models are set to be the same or different.
The invention relates to a voice emotion recognition method based on a neural network, which comprises the steps of firstly classifying target voice signals into four types of emotions of happiness, sadness, neutrality and anger, then extracting the characteristics of the voice signals based on a filter bank, then respectively sending the characteristics into a convolutional neural network and a time delay neural network to automatically extract emotion characteristics, obtaining the probability value belonging to each type of emotion by using a normalized exponential function classifier, and selecting the emotion corresponding to the maximum probability value as the emotion category of the voice; and then the target voice signal is recognized as a text, the text emotion classification is obtained by sending the text to a pre-training model of a bidirectional encoder, and the final emotion classification is obtained by fusing the three models, so that the problems that model fusion and multi-mode emotion recognition training are difficult and the accuracy is not improved greatly in the prior art are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a speech emotion recognition method based on a neural network according to the present invention.
FIG. 2 is a model architecture diagram of the convolutional neural network of the present invention.
FIG. 3 is a model architecture diagram of the delay neural network of the present invention.
Fig. 4 is a block diagram of a single layer bi-directional encoder of the present invention.
FIG. 5 is a schematic diagram of the model fusion weighted procedure of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In this application, the corresponding terms may also be referred to as other names, for example, the filter bank-based feature is an FBank feature, the convolutional neural network is CNN, the delay neural network is ECAPA-TDNN, the pre-training model of the bidirectional encoder is Bert, and the normalized exponential function is Softmax.
Referring to fig. 1, the present invention provides a speech emotion recognition method based on a neural network, including the following steps:
s1: extracting voice features and sending the voice features to a convolutional neural network to obtain convolutional emotion categories;
s2: the voice features are sent to a time delay neural network to obtain time delay emotion types;
s3: recognizing a voice text and sending the voice text into a pre-training model of a bidirectional encoder to obtain the emotion type of the text;
s4: and model fusion to obtain the final emotion classification.
The speech feature is a filterbank-based feature of the target speech signal.
The emotional characteristics of the target speech signal are divided into four categories of happiness, sadness, neutrality and anger, and the convolution emotion category, the time delay emotion category, the text emotion category and the final emotion category can be any one of the four categories.
In the process of extracting voice features and sending the voice features to a convolutional neural network to obtain convolutional emotion categories, the convolutional neural network automatically extracts emotion features contained in the voice features, a normalized exponential function classifier is used for obtaining probability values of the emotion features belonging to each category, and the emotion features corresponding to the maximum probability values are selected as convolutional emotion categories.
And in the process of sending the voice features into a time delay neural network to obtain time delay emotion categories, the time delay neural network automatically extracts emotion features contained in the voice features, then uses a normalization index function classifier to obtain probability values of the emotion features belonging to each category, and selects the emotion features corresponding to the maximum probability values as the time delay emotion categories.
Recognizing a voice text and sending the voice text into a pre-training model of a bidirectional encoder to obtain a text emotion category, wherein the method comprises the following steps:
recognizing a text corresponding to the target voice signal by utilizing a voice recognition technology to obtain a voice text;
mapping the characters in the voice text into corresponding labels to form a label sequence;
sending the label sequence into a pre-training model of a bidirectional encoder, and extracting emotional characteristics contained in the text;
and obtaining the probability value of the emotional feature belonging to each type by using a normalized index function classifier, and selecting the emotional feature corresponding to the maximum probability value as the text emotional category.
In the process of obtaining the final emotion category through model fusion, the probability values of the convolution emotion category, the time delay emotion category and the text emotion category after respective normalization index functions are linearly added, and the emotion feature corresponding to the maximum value is selected as the final emotion category.
In the linear addition, the weight values of different models may be set to be the same or different.
Further, referring to fig. 2, the model architecture of the convolutional neural network CNN is as follows:
the speech signal is used as the input of the convolutional neural network based on the characteristics of a filter bank, the model is composed of 5 layers of two-dimensional convolutional neural network blocks, each two-dimensional convolutional neural network block is composed of 3 parts, namely a two-dimensional convolutional neural network, a batch normalization layer and a maximum pooling layer. And then connecting a global average pooling layer. And then connecting the full connection layer, obtaining the probability value belonging to each type of emotion by activating the function to be the normalized index function softmax, and then selecting the emotion corresponding to the maximum probability value as the emotion category of the voice.
The architecture of the time-delay neural network ECAPA-TDNN model is shown in FIG. 3:
the method comprises the steps of using the filter bank-based features of a voice signal as the input of a model, connecting a time delay neural network to the rear of the model, connecting a modified linear unit activation function and a batch standardization network to the rear of the model, connecting a 3-layer feature compression and excitation module, inputting the output of the first and second feature compression and excitation modules and the output of the third feature compression and excitation module into the time delay neural network, connecting the modified linear unit activation function to the model, obtaining a statistical attention pooling vector based on the features of the filter bank through attention pooling calculation, carrying out batch standardization, sending the statistical attention pooling vector to a full-connection network layer, carrying out batch standardization, obtaining probability values belonging to each emotion through an additional angle margin normalization index function, and selecting the maximum class as the emotion class of the voice.
In the process of pre-training the model by Bert:
the text corresponding to the voice is recognized by utilizing a voice recognition technology, and then each word in the text is mapped into a corresponding label according to a dictionary, wherein different words correspond to different labels. The corresponding label sequence of the text is then input to a pre-trained model of a bi-directional encoder (Bert).
The Bert pre-training model is formed by overlapping a multi-layer bidirectional encoder. The structure of the single-layer bi-directional encoder is shown in fig. 4. Inputting a text, extracting to obtain input embedding, carrying out position coding on input information, then sending the input information into a coder for coding, then sending the output of the previous layer into a decoder, sending the output of the previous layer into a full connection layer and a normalization index function softmax layer for classification by combining the characteristics obtained by coding of the coder, and obtaining the emotion category of the text.
Further, in the process of obtaining the final emotion classification through model fusion:
referring to fig. 5, the fusion method: the probability value after the softmax of the weight 1 × CNN + the probability value after the softmax of the weight 2 × ECAPA-TDNN + the probability value after the softmax of the weight 3 × Bert is a new probability value, and then the emotion corresponding to the maximum value is selected as the final emotion category.
Wherein: weight 1+ weight 2+ weight 3 ═ 1
The invention also provides a specific embodiment illustrating the improvement change of the identification accuracy rate:
related terms mean: accuracy rate is the number of correctly predicted samples/total number of samples
Weighted accuracy WA: the accuracy of a certain type of emotion is the proportion of a certain type of emotion in a data set;
non-weighted accuracy UA is the accuracy of a certain type of emotion classification.
Model 1 Filter Bank based features (Fbank features) input as speech, weighted accuracy WA, unweighted accuracy UA 67%, 65% using convolutional neural network cnn model
Model 2 Filter bank-based features (Fbank features) of input speech, weighted accuracy WA and unweighted accuracy UA of 67% and 66% by using a time-delay neural network ECAPA-TDNN model
Model 3, a two-way encoder Bert pre-training model for inputting text, with a weighted accuracy WA and a non-weighted accuracy UA of 62% and 61%
The weights of the set different models are the same, and the speech emotion recognition result is as follows:
# weighted accuracy WA, non-weighted accuracy UA 76%, 74%
(1. probability value after softmax of model 1+ 1. probability value after softmax of model 2+ 1. probability value after model 3 softmax)/3
When the models are fused, the weights are changed to be different, and the performance is greatly improved:
# weighted accuracy WA, non-weighted accuracy UA 81%, 80%
(0.5 probability value after softmax of model 1+ 2.1 probability value after softmax of model 2+ 0.4 probability value after model 3 softmax)/3
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A speech emotion recognition method based on a neural network is characterized by comprising the following steps:
extracting voice features and sending the voice features to a convolutional neural network to obtain convolutional emotion categories;
the voice features are sent to a time delay neural network to obtain time delay emotion types;
recognizing a voice text and sending the voice text into a pre-training model of a bidirectional encoder to obtain the emotion type of the text;
and model fusion to obtain the final emotion classification.
2. The method of claim 1, wherein the speech features are filter bank based features of a target speech signal.
3. The neural network-based speech emotion recognition method of claim 2, wherein the emotion characteristics of the target speech signal are classified into four categories of happy, sad, neutral and angry, and the convolutional emotion category, the time-delayed emotion category, the text emotion category and the final emotion category are any one of the four categories.
4. The method for recognizing the speech emotion based on the neural network as claimed in claim 1, wherein in the process of extracting the speech features and sending the speech features to the convolutional neural network to obtain the convolutional emotion categories, the convolutional neural network automatically extracts the emotion features contained in the speech features, then uses a normalized exponential function classifier to obtain the probability value of the emotion features belonging to each category, and selects the emotion features corresponding to the maximum probability value as the convolutional emotion categories.
5. The method for recognizing the speech emotion based on the neural network as claimed in claim 1, wherein in the process of sending the speech features into the time-delay neural network to obtain the time-delay emotion categories, the time-delay neural network automatically extracts the emotion features included in the speech features, then uses a normalized exponential function classifier to obtain the probability value of the emotion features belonging to each category, and selects the emotion features corresponding to the maximum probability value as the time-delay emotion categories.
6. The method for speech emotion recognition based on neural network as claimed in claim 2, wherein the speech text is recognized and fed into the pre-training model of the bi-directional encoder to obtain the text emotion classification, comprising the following steps:
recognizing a text corresponding to the target voice signal by utilizing a voice recognition technology to obtain a voice text;
mapping the characters in the voice text into corresponding labels to form a label sequence;
sending the label sequence into a pre-training model of a bidirectional encoder, and extracting emotional characteristics contained in the text;
and obtaining the probability value of the emotional feature belonging to each type by using a normalized index function classifier, and selecting the emotional feature corresponding to the maximum probability value as the text emotional category.
7. The speech emotion recognition method based on neural network as claimed in claim 1, wherein in the process of obtaining the final emotion category through model fusion, the probability values after normalization index functions of the convolution emotion category, the time delay emotion category and the text emotion category are linearly added, and the emotion feature corresponding to the maximum value is selected as the final emotion category.
8. The method as claimed in claim 7, wherein the linear addition is performed by setting the weight values of different models to be the same or different.
CN202110990439.3A 2021-08-26 2021-08-26 Voice emotion recognition method based on neural network Active CN113903362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110990439.3A CN113903362B (en) 2021-08-26 2021-08-26 Voice emotion recognition method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110990439.3A CN113903362B (en) 2021-08-26 2021-08-26 Voice emotion recognition method based on neural network

Publications (2)

Publication Number Publication Date
CN113903362A true CN113903362A (en) 2022-01-07
CN113903362B CN113903362B (en) 2023-07-21

Family

ID=79188027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110990439.3A Active CN113903362B (en) 2021-08-26 2021-08-26 Voice emotion recognition method based on neural network

Country Status (1)

Country Link
CN (1) CN113903362B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN107609572A (en) * 2017-08-15 2018-01-19 中国科学院自动化研究所 Multi-modal emotion identification method, system based on neutral net and transfer learning
CN108564942A (en) * 2018-04-04 2018-09-21 南京师范大学 One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN110489521A (en) * 2019-07-15 2019-11-22 北京三快在线科技有限公司 Text categories detection method, device, electronic equipment and computer-readable medium
CN110534132A (en) * 2019-09-23 2019-12-03 河南工业大学 A kind of speech-emotion recognition method of the parallel-convolution Recognition with Recurrent Neural Network based on chromatogram characteristic
CN111081280A (en) * 2019-12-30 2020-04-28 苏州思必驰信息科技有限公司 Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
US20200192927A1 (en) * 2018-12-18 2020-06-18 Adobe Inc. Detecting affective characteristics of text with gated convolutional encoder-decoder framework
CN111583964A (en) * 2020-04-14 2020-08-25 台州学院 Natural speech emotion recognition method based on multi-mode deep feature learning
CN112700796A (en) * 2020-12-21 2021-04-23 北京工业大学 Voice emotion recognition method based on interactive attention model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN107609572A (en) * 2017-08-15 2018-01-19 中国科学院自动化研究所 Multi-modal emotion identification method, system based on neutral net and transfer learning
CN108564942A (en) * 2018-04-04 2018-09-21 南京师范大学 One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
US20200192927A1 (en) * 2018-12-18 2020-06-18 Adobe Inc. Detecting affective characteristics of text with gated convolutional encoder-decoder framework
CN110489521A (en) * 2019-07-15 2019-11-22 北京三快在线科技有限公司 Text categories detection method, device, electronic equipment and computer-readable medium
CN110534132A (en) * 2019-09-23 2019-12-03 河南工业大学 A kind of speech-emotion recognition method of the parallel-convolution Recognition with Recurrent Neural Network based on chromatogram characteristic
CN111081280A (en) * 2019-12-30 2020-04-28 苏州思必驰信息科技有限公司 Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN111583964A (en) * 2020-04-14 2020-08-25 台州学院 Natural speech emotion recognition method based on multi-mode deep feature learning
CN112700796A (en) * 2020-12-21 2021-04-23 北京工业大学 Voice emotion recognition method based on interactive attention model

Also Published As

Publication number Publication date
CN113903362B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN110825845B (en) Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN109241255B (en) Intention identification method based on deep learning
CN112100383B (en) Meta-knowledge fine tuning method and platform for multitask language model
CN112732916B (en) BERT-based multi-feature fusion fuzzy text classification system
CN112307208A (en) Long text classification method, terminal and computer storage medium
CN113223509B (en) Fuzzy statement identification method and system applied to multi-person mixed scene
CN110263164A (en) A kind of Sentiment orientation analysis method based on Model Fusion
CN113836992A (en) Method for identifying label, method, device and equipment for training label identification model
CN111506700A (en) Fine-grained emotion analysis method based on context perception embedding
CN111899766A (en) Speech emotion recognition method based on optimization fusion of depth features and acoustic features
CN112989843B (en) Intention recognition method, device, computing equipment and storage medium
CN113297374A (en) Text classification method based on BERT and word feature fusion
CN112883167A (en) Text emotion classification model based on hierarchical self-power-generation capsule network
CN112364636A (en) User intention identification system based on dual target coding
CN113903362A (en) Speech emotion recognition method based on neural network
CN115859989A (en) Entity identification method and system based on remote supervision
CN114091469B (en) Network public opinion analysis method based on sample expansion
CN116204643A (en) Cascade label classification method based on multi-task learning knowledge enhancement
CN113257225B (en) Emotional voice synthesis method and system fusing vocabulary and phoneme pronunciation characteristics
CN115169363A (en) Knowledge-fused incremental coding dialogue emotion recognition method
CN111814468B (en) Self-adaptive architecture semantic distribution text understanding method and system
CN114121018A (en) Voice document classification method, system, device and storage medium
CN113255360A (en) Document rating method and device based on hierarchical self-attention network
CN113705194A (en) Extraction method and electronic equipment for short
CN113761106A (en) Self-attention-enhanced bond transaction intention recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant