CN112885378B - Speech emotion recognition method and device and storage medium - Google Patents

Speech emotion recognition method and device and storage medium Download PDF

Info

Publication number
CN112885378B
CN112885378B CN202110086550.XA CN202110086550A CN112885378B CN 112885378 B CN112885378 B CN 112885378B CN 202110086550 A CN202110086550 A CN 202110086550A CN 112885378 B CN112885378 B CN 112885378B
Authority
CN
China
Prior art keywords
neural network
deep neural
training
voice emotion
deep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110086550.XA
Other languages
Chinese (zh)
Other versions
CN112885378A (en
Inventor
刘振焘
吴保晗
佘锦华
吴敏
熊永华
周莉
赵兴旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202110086550.XA priority Critical patent/CN112885378B/en
Publication of CN112885378A publication Critical patent/CN112885378A/en
Application granted granted Critical
Publication of CN112885378B publication Critical patent/CN112885378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The invention provides a speech emotion recognition method, a speech emotion recognition device and a storage medium, and relates to the technical field of artificial intelligence. The current speech emotion recognition method mainly comprises the following steps: the method is based on deep neural network recognition and traditional machine learning method, but the mainstream method is still deep learning. The deep learning-based method mainly comprises the steps of extracting features of preprocessed voice signals, then sending the voice signals into a deep neural network for training, and then classifying the voice signals through methods such as a support vector machine and a decision tree. These methods have advantages, but in practical applications, a large amount of labeled data is required, and when sample data is insufficient, an overfitting phenomenon is likely to occur, and accurate identification cannot be performed. The method has the advantages of effectively improving the overfitting phenomenon caused by insufficient sample number in the deep neural network and improving the training efficiency and accuracy.

Description

Speech emotion recognition method and device and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a speech emotion recognition method, a speech emotion recognition device and a storage medium.
Background
Currently, the detection information applied to human emotion recognition research includes voice, facial expressions, physiological signals, body language, and the like. The voice is the fastest and most natural method for communication between people, and the voice emotion recognition research is significant for promoting harmonious human-computer interaction.
With the proposal of a large-scale calibration data set and a deep neural network structure, the deep learning algorithm obtains relatively high recognition accuracy rate on speech emotion recognition. However, these achievements are all achieved by iteratively updating model parameters using deep model training based on a large amount of label data. The real-world environment is often more complex than the published experimental data set, constructing a speech data set that can cover the complete sample distribution requires a lot of manpower and financial resources to collect data and calibration data, and it is difficult for some languages to collect sufficient corpora.
Disclosure of Invention
The technical problem solved by the present disclosure is how to train a model with a small amount of sample data, and improve the overfitting phenomenon caused by insufficient number of samples in a deep neural network.
According to one aspect of the present disclosure, there is provided a speech emotion recognition method including: acquiring a large-scale voice emotion data set; constructing a deep reinforcement learning model; inputting a large-scale voice emotion data set into a deep reinforcement learning model to pre-train a deep neural network; migrating the pre-trained deep neural network to a meta-learning model to form a deep neural network model; acquiring a small sample voice emotion data set; inputting the small sample voice emotion data set into the deep neural network model to carry out meta-training on the deep neural network; and carrying out element test on the depth neural network model after element training, and outputting a speech emotion recognition result.
In some embodiments, constructing the deep reinforcement learning model comprises: the deep learning framework comprises an intelligent agent and an environment, and a deep neural network is embedded into the intelligent agent to form a deep reinforcement learning model;
in some embodiments, inputting the large-scale speech emotion data set into the deep reinforcement learning model to pre-train the deep neural network comprises: acquiring a first voice emotion signal in a large-scale voice emotion data set; preprocessing the first voice emotion signal; extracting a first voice emotion characteristic from the preprocessed first voice emotion signal; and inputting the first speech emotion characteristics into a deep reinforcement learning model for training.
In some embodiments, inputting a small sample speech emotion data set into the deep neural network model meta-training the deep neural network model comprises: replacing the first classification layer of the deep neural network model with a second classification layer which conforms to the small sample data category; dividing a small sample voice emotion data set into a training set and a test set; acquiring a second voice emotion signal in the training set and the test set; preprocessing the second voice emotion signal; extracting a second voice emotion feature from the preprocessed second voice emotion signal; and inputting the second speech emotion characteristics extracted from the training set into the deep neural network model to perform meta-training on the deep neural network.
In some embodiments, the identification method comprises: dividing a training set into a first support set and a first query set; updating the parameters of the second classification layer by using the loss values obtained by the first support set, and updating a scaling movement function by using the loss values obtained by the first query set as the parameters; dividing the test set into a second support set and a second query set, and finely adjusting the parameters of the second classification layer by using the second support set; and outputting a speech emotion recognition result by utilizing the second query set and evaluating the deep neural network model.
According to another aspect of the present disclosure, there is provided a speech emotion recognition apparatus including: the deep reinforcement learning module is used for inputting a large-scale speech emotion data set and pre-training a deep neural network; the transfer learning module is used for transferring the pre-trained deep neural network to the meta-learning model to form a deep neural network model; and the deep neural network module is used for inputting a small sample voice emotion data set to perform meta-training on the deep neural network model, performing meta-testing on the meta-trained deep neural network model, and outputting a voice emotion recognition result.
In some embodiments, the deep reinforcement learning module is configured to: the intelligent system mainly comprises an intelligent agent and an environment, wherein a deep neural network is embedded in the intelligent agent; the method comprises the steps of obtaining a first voice emotion signal in a large-scale voice emotion data set, preprocessing the first voice emotion signal, extracting a first voice emotion feature from the preprocessed first voice emotion signal, and inputting the first voice emotion feature into a deep reinforcement learning frame for training.
In some embodiments, the deep neural network module is configured to: replacing the first classification layer of the deep neural network model with a second classification layer which conforms to the small sample data category; dividing a small sample speech emotion data set into a training set and a test set; acquiring a second voice emotion signal in the training set and the test set; preprocessing the second voice emotion signal; extracting a second voice emotion characteristic from the preprocessed second voice emotion signal; and inputting the second speech emotion characteristics extracted from the training set into the deep neural network model to perform meta-training on the deep neural network model.
In some embodiments, the deep neural network module is configured to: dividing a training set into a first support set and a first query set; updating the parameters of the second classification layer by using the loss values obtained by the first support set, and updating a scaling movement function by using the loss values obtained by the first query set as the parameters; dividing the test set into a second support set and a second query set, and finely adjusting the parameters of the second classification layer by using the second support set; and outputting a speech emotion recognition result by utilizing the second query set and evaluating the deep neural network model.
According to another aspect of the disclosure, a storage medium having computer instructions stored thereon is provided, wherein the computer instructions, when executed by a processor, are for implementing a method of speech emotion recognition.
The technical scheme provided by the invention has the beneficial effects that:
1. the deep neural network is subjected to enhanced pre-training by utilizing large-scale voice emotion data so as to be better adapted to the sample environment, and the training efficiency and accuracy are improved.
2. The neural network after the strengthening pre-training is migrated to the small sample learning, so that the fast convergence can be realized by fewer tasks, and the overfitting phenomenon caused by the insufficient number of samples in the deep neural network is effectively improved.
3. In the meta-learning, only a small number of parameters in the neural network after pre-training are updated, and the problem of 'catastrophic forgetting' for learning a specific new task is avoided.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a speech emotion recognition method in an embodiment of the present invention;
FIG. 2 is a diagram illustrating a deep reinforcement learning model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a speech emotion recognition apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without undue experimentation, are within the scope of the present disclosure.
Some embodiments of the present disclosure are described in detail below in conjunction with fig. 1-3.
Fig. 1 illustrates some identification methods of the present disclosure.
As shown in fig. 1, the method comprises the steps of:
step S101, constructing a deep reinforcement learning model, which is characterized in that:
the reinforcement learning framework mainly comprises an intelligent agent and an environment, and a deep neural network is embedded into the intelligent agent part in the reinforcement learning framework to form a deep reinforcement learning model:
the learning framework is represented by a Markov decision model, which can be represented as a quadruple:
(S,A,P,R)
wherein S represents a state space, A represents a motion space, P represents a state transition strategy, and R represents a reward function;
the action decision is given by the "agent", which uses the following reward function to obtain the reward w t
Figure BDA0002910969800000041
Wherein x is t Is the basic truth value of the speech emotional characteristic, and the following formula is adopted to calculate the action selection probability n t
n t =Softmax(Q a gS t +p b )
Wherein n is t Is the action selection probability, Q a Is the weight, p, of the deep neural network b Is a heavy weight, S t Is the output of the previous hidden layer.
Step S102, inputting the large-scale speech emotion data set into a deep reinforcement learning model to pre-train a deep neural network, and the method is characterized in that:
acquiring an input voice signal x (t) from a voice database, and preprocessing the voice signal x (t), wherein the preprocessing comprises pre-emphasis, framing and windowing;
performing discrete Fourier transform on a voice signal X (t) to obtain a discrete spectrum X (k), wherein the extracted transform formula is as follows:
Figure BDA0002910969800000042
wherein X (N) represents the preprocessed speech signal, X (k) represents the discrete processed speech signal, N is the number of discrete signals, N =0,1, \ 8230;, N-1, k =0,1, ·, N-1;
inputting the voice signal after discrete Fourier transform into a Mel filter bank, and obtaining the following logarithmic spectrum after logarithm:
Figure BDA0002910969800000051
wherein S (m) is a logarithmic spectrum, H m (k) The transfer function of the triangular filter, M is the number of the triangular filters, generally 24-40, the center frequency is f (M), and M is more than or equal to 0 and less than M;
and transforming the discrete cosine into a cepstrum frequency domain to obtain a Mel frequency cepstrum coefficient signal as follows:
Figure BDA0002910969800000052
c (n) represents mel-frequency cepstrum coefficients;
and sending the obtained Mel frequency cepstrum coefficient speech emotion characteristics into a deep reinforcement learning model for pre-training.
And S103, migrating the pre-trained deep neural network to a meta-learning model to form a deep neural network model, wherein the classification layer of the deep neural network model is determined according to the number of large-scale voice emotion data sample types, and the large-scale voice emotion data set type and the small-sample voice emotion data set type are not necessarily the same, so that the classification layer of the deep neural network model needs to be replaced by a classification layer conforming to the small-sample type.
Step S104, inputting a small sample speech emotion data set into the deep neural network model to carry out meta-training on the deep neural network model, and the method is characterized in that:
(1) Firstly, dividing the voice data of the small sample into a training set and a testing set in proportion, wherein the division proportion can be 7: 1. 3:1, wherein the training set is divided into a support set and a query set, and the specific division process is as follows:
assuming that a small sample emotion voice data set is a k1 type sample, firstly, randomly extracting n1 samples from each type of k1 type samples in a training set as a support set, then randomly extracting x1 samples from the rest k1 type samples as a query set, and dividing the support set and the query set in a test set by the same method as the dividing method in the training set, so far, a k-way n-shot task is constructed, namely, one epamode in meta-training is selected and completed, and a plurality of epamodes can be constructed according to the scale of data in the training set, wherein each epamode corresponds to one task.
(2) Then, preprocessing and voice emotion feature extraction are carried out on the small sample voice emotion data set by using the method in the step S102;
(3) Inputting a small sample speech emotion data set into a deep neural network model to carry out meta-training on the deep neural network model, wherein:
updating the parameters of the replaced classification layer by using a gradient descent method for the loss values obtained by the support set in the test set, and updating a scaling function for the loss values obtained by the query set in the test set, so that most of the parameters in the network are kept unchanged;
each task is subjected to meta-training in a deep neural network, and loss values of a support set and a query set can be obtained according to a cross entropy loss function:
Figure BDA0002910969800000061
wherein O is the number of sample categories; y is ic Indicating a variable (0 or 1) which is 1 if the class is the same as the class of sample i, otherwise 0 ic A predicted probability of belonging to class c for an observation sample i;
setting the penalty value of the support set to L s With the penalty value of the query set to L q ,L s And updating the parameters of the optimized and replaced classification layer by using a gradient descent algorithm:
Figure BDA0002910969800000062
/>
wherein β is the learning rate;
L q updating parameters that optimize scaling and shifting using a gradient descent algorithm
Figure BDA0002910969800000063
And &>
Figure BDA0002910969800000064
Wherein the scaling parameter is->
Figure BDA0002910969800000065
Is set to 1, the movement parameter->
Figure BDA0002910969800000066
Is set to 0:
Figure BDA0002910969800000067
wherein γ is the learning rate;
the parameters of the neurons can be fixed by utilizing the scaling movement function, so that only the classification layer parameters and the scaling movement parameters need to be updated in the whole element training process, and the updating of the whole network parameters is avoided;
Figure BDA0002910969800000068
where X is an input, lines indicate that the array elements are multiplied one by one.
Step S105, performing element test on the deep neural network model after the element training, and outputting a speech emotion recognition result, including:
because a plurality of epicodes carry out optimization training on the whole neural network model in the meta-training process, the fast adaptation capability of the model to unknown tasks needs to be tested, and the following meta-testing mainly comprises two parts:
1. fine-tuning the parameters of the second classification layer by using a support set in the test set, namely, firstly solving a cross entropy loss function by using the support set, and then updating a small number of parameters in the neural network by using a gradient descent algorithm;
2. and finally outputting a recognition result by using the query set, and evaluating the whole model.
Some embodiments of the disclosed speech emotion recognition apparatus based on reinforcement element transfer learning are described below with reference to fig. 3.
Fig. 3 is a schematic structural diagram of a speech emotion recognition apparatus based on reinforcement element transfer learning according to some embodiments of the present disclosure.
The device 30 comprises:
the deep reinforcement learning module 301 is used for inputting a large-scale speech emotion data set and pre-training a deep neural network; a migration learning module 302, configured to migrate the pre-trained deep neural network to a meta learning model to form a deep neural network model; and the deep neural network module 303 is configured to input the small sample speech emotion data set to perform meta-training on the deep neural network model, perform meta-testing on the deep neural network model subjected to the meta-training, and output a speech emotion recognition result.
In some embodiments, the deep reinforcement learning module 301 is configured to: mainly composed of an agent 304, in which a deep neural network is embedded, and an environment 305; the method comprises the steps of obtaining a first voice emotion signal in a large-scale voice emotion data set, preprocessing the first voice emotion signal, extracting a first voice emotion feature from the preprocessed first voice emotion signal, and inputting the first voice emotion feature into a deep reinforcement learning module for training.
In some embodiments, the deep neural network module 303 is configured to: replacing the first classification layer of the deep neural network model with a second classification layer that conforms to a small sample data category; dividing a small sample voice emotion data set into a training set and a test set; acquiring a second voice emotion signal in the training set and the test set; preprocessing the second voice emotion signal; extracting a second voice emotion characteristic from the preprocessed second voice emotion signal; and inputting the second speech emotion characteristics extracted from the training set into the deep neural network model to perform meta-training on the deep neural network model.
In some embodiments, the deep neural network module 303 is further configured to: dividing a training set into a support set and a query set; updating the parameters of the second classification layer by using the loss values obtained by the first support set, and updating the scaling movement function by using the loss values obtained by the first query set; fine-tuning parameters of the second classification layer by using a second support set; and outputting a speech emotion recognition result by utilizing the second query set and evaluating the deep neural network model.
The present disclosure further includes a readable computer storage medium, on which computer instructions are stored, wherein the computer instructions, when executed by a processor, are configured to implement the method for speech emotion recognition based on reinforcement element migration learning in any of the foregoing embodiments.
The present disclosure is described in terms of flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure and is not to be construed as limiting the present disclosure, but rather as the subject matter of the invention is to be construed in all aspects and as broadly as possible, and all changes, equivalents, and modifications that fall within the spirit and scope of the present disclosure are therefore intended to be embraced therein.

Claims (6)

1. A speech emotion recognition method includes:
constructing a deep reinforcement learning model;
inputting a large-scale voice emotion data set into a deep reinforcement learning model to pre-train a deep neural network; wherein the pre-training comprises acquiring a first speech emotion signal in a large-scale speech emotion data set;
preprocessing the first voice emotion signal;
extracting a first voice emotion feature from the preprocessed first voice emotion signal;
inputting the first voice emotion feature into the deep reinforcement learning model for training;
migrating the pre-trained deep neural network to a meta-learning model to form a deep neural network model;
inputting a small sample voice emotion data set into the deep neural network model to perform element training on the pre-trained deep neural network; wherein the meta-training comprises replacing a first classification layer of the deep neural network model with a second classification layer that conforms to a small sample data category;
dividing the small sample voice emotion data set into a training set and a test set;
acquiring a second voice emotion signal in the training set and the test set;
preprocessing the second voice emotion signal;
extracting a second voice emotion feature from the preprocessed second voice emotion signal;
inputting the second speech emotion characteristics extracted from the training set into the deep neural network model to perform meta-training on the deep neural network model;
and performing element test on the deep neural network model after the element training, and outputting a speech emotion recognition result.
2. The method for recognizing speech emotion according to claim 1, wherein said constructing a deep reinforcement learning model includes:
the reinforcement learning framework mainly comprises an intelligent agent and an environment, and the deep neural network is embedded into the intelligent agent to form a deep reinforcement learning model.
3. A speech emotion recognition method as claimed in claim 1, comprising:
dividing the training set into a first support set and a first query set;
and updating the parameters of the second classification layer by using the loss values obtained by the first support set, and updating a scaling movement function by using the loss values obtained by the first query set as parameters.
4. A speech emotion recognition method as claimed in claim 3, wherein meta-testing said deep neural network model after said meta-training comprises:
dividing the test set into a second support set and a second query set;
fine-tuning parameters of the second classification layer by using the second support set;
and outputting a speech emotion recognition result by utilizing the second query set and evaluating the deep neural network model.
5. A speech emotion recognition device based on reinforced meta-migration learning comprises:
the deep reinforcement learning module is used for inputting a large-scale speech emotion data set and pre-training a deep neural network; wherein the deep reinforcement learning module is configured to:
the reinforcement learning module mainly comprises an intelligent agent and an environment, and the deep neural network is embedded into the intelligent agent to form a deep reinforcement learning module;
acquiring a first voice emotion signal in a large-scale voice emotion data set, preprocessing the first voice emotion signal, extracting a first voice emotion feature from the preprocessed first voice emotion signal, and inputting the first voice emotion feature into the deep reinforcement learning module for pre-training;
the transfer learning module is used for transferring the pre-trained deep neural network to a meta-learning model to form a deep neural network model;
the deep neural network module is used for inputting a small sample voice emotion data set to perform element training on the deep neural network, performing element testing on the deep neural network model subjected to the element training and outputting a voice emotion recognition result;
wherein the deep neural network module is configured to:
replacing the first classification layer of the deep neural network module with a second classification layer that conforms to a small sample data category;
dividing the small sample voice emotion data set into a training set and a test set;
acquiring a second voice emotion signal in the training set and the test set;
preprocessing the second voice emotion signal;
extracting a second voice emotion feature from the preprocessed second voice emotion signal;
inputting the second speech emotion features extracted from the training set into the deep neural network module to perform meta-training on the deep neural network module;
dividing the training set into a first support set and a first query set;
updating the parameters of the second classification layer by using the loss values obtained by the first support set, and updating a scaling movement function by using the loss values obtained by the first query set;
dividing the test set into a second support set and a second query set;
fine-tuning parameters of the second classification layer by using the second support set;
and outputting a speech emotion recognition result by utilizing the second query set and evaluating the deep neural network model.
6. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, are adapted to implement the steps of the method of any of claims 1-4.
CN202110086550.XA 2021-01-22 2021-01-22 Speech emotion recognition method and device and storage medium Active CN112885378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110086550.XA CN112885378B (en) 2021-01-22 2021-01-22 Speech emotion recognition method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110086550.XA CN112885378B (en) 2021-01-22 2021-01-22 Speech emotion recognition method and device and storage medium

Publications (2)

Publication Number Publication Date
CN112885378A CN112885378A (en) 2021-06-01
CN112885378B true CN112885378B (en) 2023-03-24

Family

ID=76050003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110086550.XA Active CN112885378B (en) 2021-01-22 2021-01-22 Speech emotion recognition method and device and storage medium

Country Status (1)

Country Link
CN (1) CN112885378B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611326B (en) * 2021-08-26 2023-05-12 中国地质大学(武汉) Real-time voice emotion recognition method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018236674A1 (en) * 2017-06-23 2018-12-27 Bonsai Al, Inc. For hiearchical decomposition deep reinforcement learning for an artificial intelligence model
CN111062491A (en) * 2019-12-13 2020-04-24 周世海 Intelligent agent unknown environment exploration method based on reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210019642A1 (en) * 2019-07-17 2021-01-21 Wingman AI Agents Limited System for voice communication with ai agents in an environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018236674A1 (en) * 2017-06-23 2018-12-27 Bonsai Al, Inc. For hiearchical decomposition deep reinforcement learning for an artificial intelligence model
CN111062491A (en) * 2019-12-13 2020-04-24 周世海 Intelligent agent unknown environment exploration method based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于参数迁移和卷积循环神经网络的语音情感识别;缪裕青等;《计算机工程与应用》;20190515(第10期);全文 *

Also Published As

Publication number Publication date
CN112885378A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN109817246B (en) Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium
CN107239529B (en) Public opinion hotspot category classification method based on deep learning
Wang et al. Research on Web text classification algorithm based on improved CNN and SVM
CN109766277A (en) A kind of software fault diagnosis method based on transfer learning and DNN
CN109523994A (en) A kind of multitask method of speech classification based on capsule neural network
CN111354338B (en) Parkinson speech recognition system based on PSO convolution kernel optimization sparse transfer learning
CN113887643B (en) New dialogue intention recognition method based on pseudo tag self-training and source domain retraining
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN115689008A (en) CNN-BilSTM short-term photovoltaic power prediction method and system based on ensemble empirical mode decomposition
CN112580588A (en) Intelligent flutter signal identification method based on empirical mode decomposition
CN111461201A (en) Sensor data classification method based on phase space reconstruction
CN108009571A (en) A kind of semi-supervised data classification method of new direct-push and system
CN110853630A (en) Lightweight speech recognition method facing edge calculation
CN111950295A (en) Method and system for training natural language processing model
CN113673242A (en) Text classification method based on K-neighborhood node algorithm and comparative learning
CN112885378B (en) Speech emotion recognition method and device and storage medium
CN113516097B (en) Plant leaf disease identification method based on improved EfficentNet-V2
CN114329124A (en) Semi-supervised small sample classification method based on gradient re-optimization
CN110532380A (en) A kind of text sentiment classification method based on memory network
CN106448660A (en) Natural language fuzzy boundary determining method with introduction of big data analysis
CN116050419B (en) Unsupervised identification method and system oriented to scientific literature knowledge entity
CN114757310B (en) Emotion recognition model and training method, device, equipment and readable storage medium thereof
CN116013407A (en) Method for generating property decoupling protein based on language model
CN113297376A (en) Legal case risk point identification method and system based on meta-learning
CN112863549A (en) Voice emotion recognition method and device based on meta-multitask learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant