CN106782602B - Speech emotion recognition method based on deep neural network - Google Patents
Speech emotion recognition method based on deep neural network Download PDFInfo
- Publication number
- CN106782602B CN106782602B CN201611093447.3A CN201611093447A CN106782602B CN 106782602 B CN106782602 B CN 106782602B CN 201611093447 A CN201611093447 A CN 201611093447A CN 106782602 B CN106782602 B CN 106782602B
- Authority
- CN
- China
- Prior art keywords
- layer
- convolution
- output
- neural network
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013528 artificial neural network Methods 0.000 title claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 48
- 230000008451 emotion Effects 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 35
- 239000013598 vector Substances 0.000 claims description 25
- 238000011176 pooling Methods 0.000 claims description 20
- 230000005284 excitation Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 5
- 230000017105 transposition Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract description 3
- 206010063659 Aversion Diseases 0.000 abstract 1
- 230000008569 process Effects 0.000 description 7
- 230000002996 emotional effect Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000009098 adjuvant therapy Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Child & Adolescent Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a speech emotion recognition method based on a long-time memory network and a convolutional neural network, which comprises the steps of constructing a speech emotion recognition system based on LSTM and CNN, taking a speech sequence as the input of the system, training the LSTM and CNN by adopting a back propagation algorithm, optimizing parameters of the network and obtaining an optimized network model; and carrying out emotion classification on the newly input voice sequence by using the trained network model, wherein the newly input voice sequence is divided into six emotions, namely sadness, happiness, aversion, fear, startle and neutrality. The method comprehensively considers two network models of LSTM and CNN, avoids the complexity of manual selection and feature extraction, and improves the accuracy of emotion recognition.
Description
Technical Field
The invention relates to the field of image processing and pattern recognition, in particular to a speech emotion recognition method based on a long-time memory network and a convolutional neural network.
Background
In interpersonal communication, there are a variety of ways of exchanging information including voice, body language, facial expressions, and the like. Among them, speech signals are the fastest and most primitive communication methods, and are considered by researchers to be one of the most effective methods for human-computer interaction. Over the last half century, scholars have studied a large number of topics regarding speech recognition, namely how to convert speech sequences into text. Despite significant advances in speech recognition, there is a long way to achieve natural human and machine interaction because the machine cannot understand the emotional state of the speaker. This has also led to another aspect of research on how to identify the emotional state of the speaker from the speech, i.e., speech emotion recognition.
The speech emotion recognition is used as an important branch of man-machine interaction, and can be widely applied to various fields such as education, medical treatment, traffic and the like. In the vehicle-mounted system, the system can be used for monitoring the mental state of a driver and judging whether the driver is in a safe state, so that the driver can be reminded when the driver is tired, and traffic accidents are avoided; in the telephone service, the system can be used for sorting the users with fierce speech expression, and switching the users to human customer service, thereby optimizing the user experience and improving the whole service level; in clinical medicine, emotional changes of depression patients or autistic children are tracked by means of speech emotion recognition and used as a tool for disease diagnosis and adjuvant therapy; in the robot research, the robot is helped to understand the emotion of a person by utilizing voice information, friendly and intelligent responses are made, and interaction is realized.
Most speech emotion recognition methods in the present stage adopt the traditional method of extracting features and then classifying by using a classifier. Common speech features include pitch, speech rate, intensity (prosodic features), linear prediction cepstral coefficients, mel-frequency cepstral coefficients (spectral features), and the like. Common classification methods include hidden markov models, support vector machines, and gaussian mixture models. The traditional emotion recognition method tends to mature, but has certain defects. For example, it is not clear which feature has the greatest influence on emotion recognition, and only one feature is selected as a basis for judgment in most experiments, so that the objectivity of emotion recognition is reduced. In addition, some existing features, such as pitch, speech rate, etc., are greatly affected by the style of the speaker, which increases the complexity of recognition.
With the development of deep learning in the near stage, many researchers choose to adopt a training network model to complete emotion recognition. The existing speech emotion recognition methods mainly include a speech emotion recognition method based on a deep belief network, a speech emotion recognition method based on a long-time memory network and a speech emotion recognition method based on a convolutional neural network. Of the three methods described above, there are major disadvantages: the advantages of each network model cannot be taken into account. For example, a deep belief network may use a one-dimensional sequence as input, but cannot exploit the correlation between the front and back of the sequence; although the long-time and short-time memory network can utilize the correlation between the front and the back of the sequence, the extracted feature dimension is higher; the convolutional neural network cannot directly process a voice sequence, and needs to perform fourier transform on a voice signal and convert the voice signal into a frequency spectrum to be used as input. The traditional speech emotion recognition method has small prospect in feature extraction and classification development, and the existing speech emotion method based on deep learning has a single network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides the speech emotion recognition method based on the long-time memory network and the convolutional neural network, avoids the complex process of manually extracting and screening the characteristics, and obtains the optimal emotion recognition effect by training the adaptive adjustment parameters of the network.
The invention adopts the following technical scheme for solving the technical problems:
the speech emotion recognition method based on the long-and-short-time memory network and the convolutional neural network comprises the following steps of:
step A, preprocessing voice samples in a voice emotion database to enable each voice sample to be represented by a sequence with equal length, and accordingly a preprocessed voice sequence is obtained;
step B, constructing a speech emotion recognition system based on the long-time memory network LSTM and the convolutional neural network CNN, wherein the speech emotion recognition system comprises two basic modules: the long and short time memory network module and the convolutional neural network module;
step C, sequentially sending the preprocessed voice sequence into a voice emotion recognition system for multiple training, and adjusting parameters of LSTM and CNN by using a back propagation algorithm to obtain an optimized network model;
and D, carrying out emotion classification on the newly input voice sequence by using the network model obtained by training in the step C, wherein the emotion classification comprises six emotions, namely sadness, happiness, disgust, fear, fright and neutrality.
As a further optimization scheme of the speech emotion recognition method based on the long-and-short-time memory network and the convolutional neural network, the long-and-short-time memory network module in the step B is specifically constructed by the following steps:
b1.1, setting the length of the speech sample sequence as m, where m is n × n, n is a positive integer, and setting the outputs of the forgetting gate unit and the input gate unit at the current time as ftAnd itIs full ofFoot:
ft=σ(Wf·xc+bf)
it=σ(Wi·xc+bi)
wherein x isc=[ht-1,xt]New vector xcIs to make two ht-1、xtObtained by joining vectors end to end, xtFor input at the current time, ht-1For hiding the state of the layer at the previous moment, xcFor the concatenated new vector, WfAnd WiWeight matrices for the forgetting gate unit and the input gate unit, respectively, bfAnd biBias vectors of a forgetting gate unit and an input gate unit respectively, wherein sigma (·) is a sigmoid excitation function;
b1.2, calculating to obtain the current cell state C by the following formulatThe value of (c):
wherein, Ct-1The state of the cells at the previous moment, is a reference value of the cell state at the present time, WcIs a weight matrix of cell states, bCTan h (-) is a hyperbolic tangent function as a bias vector for the cell state;
b1.3, obtaining the output h of each hidden node according to the following formulatH is to betSequentially connecting to form m-dimensional characteristic vectors;
ht=ot*tanh(Ct)
ot=σ(Wo·[ht-1,xt]+bo)
wherein, WoAs a weight matrix of the output gate unit, boIs an offset vector of the output gate unit, otTo be transportedAn output of the gate-out unit.
As a further optimization scheme of the speech emotion recognition method based on the long-and-short-time memory network and the convolutional neural network, the convolutional neural network module in the step B is specifically constructed by the following steps:
b2.1, converting the m-dimensional feature vector extracted in the step B1.3 into an n multiplied by n feature matrix as the input of the convolutional neural network;
b2.2, selecting m as convolution layer as first layer of convolution neural network1K is1×k1Performing convolution operation on input data by using dimensional convolution kernel with convolution step length of s1The result after convolution is non-linearly mapped by an excitation function to obtain the output m of the convolution layer1An1×l1A feature map of the dimension;
the second layer of the B2.3 convolutional neural network is a pooling layer, and m is selected2K is2×k2Performing dimensional convolution kernel with step length s on the feature map output by the first layer of convolution layer2Pooling to obtain the output m of the pooling layer2An2×l2A feature map of the dimension;
b2.4, selecting m as convolution layer as the third layer of the convolution neural network3K is3×k3Performing convolution operation on the feature map output by the second pooling layer by using a dimensional convolution kernel, and performing nonlinear mapping on the result after convolution by using an excitation function to obtain the output m of the convolution layer3An3×l3A feature map of the dimension;
b2.5, selecting m as convolution layer as the fourth layer of the convolution neural network4K is4×k4Performing convolution operation on the characteristic graph output by the third layer of convolution layer by using a dimensional convolution kernel, and performing nonlinear mapping on the result after convolution by using an excitation function to obtain the output m of the convolution layer4An4×l4A feature map of the dimension;
b2.6, selecting m as convolution layer for the fifth layer of the convolution neural network5K is5×k5A dimensional convolution kernel for performing convolution on the feature map output by the fourth layer of convolution layerPerforming product operation, performing nonlinear mapping on the convolved result by an excitation function to obtain the output m of the convolutional layer5An5×l5A feature map of the dimension;
b2.7, selecting m as a pooling layer as a sixth layer of the convolutional neural network6K is6×k6The convolution kernel of dimension carries out step length s on the characteristic diagram output by the fifth layer convolution layer6Pooling to obtain the output m of the pooling layer6An6×l6A feature map of the dimension;
b2.8, the seventh layer, the eighth layer and the ninth layer of the convolutional neural network are all connected layers; wherein, the seventh layer is to connect the feature map output by the sixth layer pooling layer to c nodes of the layer; the eighth layer carries out ReLU function nonlinear transformation on c nodes of the seventh layer, and then the connection weight of the nodes of the hidden layer is controlled by using a dropout method, wherein the total connection number is c; the output nodes of the ninth full-connection layer are p, and the output is softmax loss fused with the feature tags.
As a further optimization scheme of the speech emotion recognition method based on the long-and-short-term memory network and the convolutional neural network, a function J (θ) of softmax loss of the convolutional neural network in step B is defined as follows:
wherein x is(i)As an input vector, y(i)The emotion category corresponding to the input vector is i-1, 2, … q, and q corresponds to the number of voice samples; thetajJ is 1,2, … p, p is corresponding to the number of emotion categories, T is transposition, e is natural base number; 1 {. is an indication function, and when the value of the median in the parenthesis is true, the value of the function is 1, otherwise, the value is 0.
As a further optimization scheme of the speech emotion recognition method based on the long-and-short-time memory network and the convolutional neural network, the tanh function is expressed assigmoid function is expressed asWherein x is a variable.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
(1) the complex process of manual extraction and feature screening is avoided, and the optimal emotion recognition effect is obtained by training the adaptive adjustment parameters of the network;
(2) the speech emotion recognition method based on LSTM and CNN fuses two different network models, can directly process a speech sequence by means of LSTM, and can utilize the correlation between the front and the back of sequence time; the interference of noise is reduced by means of CNN, more abstract characteristics can be learned, and the accuracy and robustness of emotion recognition are improved.
Drawings
FIG. 1 is a flow chart of the speech emotion recognition method based on LSTM and CNN of the present invention.
FIG. 2 is a basic framework structure diagram of the constructed LSTM and CNN-based speech emotion recognition system.
FIG. 3 is a basic framework diagram of a long and short duration memory network module in the speech emotion recognition system.
Fig. 4 is a basic framework diagram of a convolutional neural network module in a speech emotion recognition system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, which is a flow chart of the speech emotion recognition method based on LSTM and CNN of the present invention, the implementation of the speech emotion recognition method based on LSTM and CNN of the present invention mainly includes the following steps:
step 1: selecting a proper voice emotion database, and collecting voice fragments in the proper voice emotion database;
in actual operation, an AFEW database is selected that provides the original video clips, all of which are cut from a movie work. Compared with a common laboratory database, the voice and emotion expression in the AFEW database is closer to the real life environment and has more generality. The sample ages were spread between 1 and 70 years, covering various age groups, including large samples of children and adolescents, which could be subsequently used for emotional identification in younger subjects. The samples in the database are divided into six categories, namely sadness, happiness, disgust, fear, fright and neutrality, which are marked by 1-6. And selecting a voice segment in the video as a sample set, wherein the sampling frequency is 48 kHz.
Step 2: reading voice sample data, and unifying the length of a sample sequence;
because of the differences in the duration of the speech samples, and considering that the useful information is mainly concentrated in the middle region of the speech sequence, 16384 sample points near the middle point of each speech sequence are actually selected to represent the entire speech. According to the following steps: and 3, randomly selecting the voice samples according to the proportion to be respectively used as a training set and a verification set. The speech sequences and labels for each sample set are stored as a pkl file.
And step 3: constructing a speech emotion recognition system, taking a speech sequence as input, training a long-time memory network, and obtaining the output of a hidden layer; FIG. 2 is a basic framework structure diagram of a constructed speech emotion recognition system based on LSTM and CNN, which illustrates the whole process of completing emotion classification of speech samples, and the system mainly comprises two basic modules of LSTM and CNN; FIG. 3 is a basic framework diagram of a long and short duration memory network module in the speech emotion recognition system, illustrating the internal structure of the LSTM network unit, reflecting the relationship between the hidden layer state and each gate unit; FIG. 4 is a basic framework diagram of a convolutional neural network module in a speech emotion recognition system, illustrating a process of generating a vector containing tag information after a feature matrix is subjected to convolution, pooling and full-connection operations;
by x0,x1,x2,…,xt… denotes the input speech sequence, h0,h1,h2,…,htAnd …, the status of each hidden node. x is the number ofc=[ht-1,xt]Indicates that the previous time is about to beThe state of the hidden layer and the input at the current moment are concatenated into a vector xc. Setting the outputs of the forgetting gate unit and the input gate unit at t time as ftAnd itCalculating ftAnd itThe values of (a) are as follows:
ft=σ(Wf·xc+bf) (1)
it=σ(Wi·xc+bi) (2)
the value of the cell state is calculated by the following formula:
the output of the network module is determined by the current cell state and is the filtered cell value. Firstly, the cell state is passed through a sigmoid unit, then the output range of the cell state is ensured to be between-1 and 1 by utilizing a tanh function, and the output value o of the layer is obtainedtMultiplying to determine the output h of the hidden layert:
ot=σ(Wo·[ht-1,xt]+bo) (4)
ht=ot*tanh(Ct) (5)
Get the output h of each hidden nodetThese are then concatenated in turn to form a long 16384 feature vector, which is converted to a 128 x 128 feature matrix.
And 4, step 4: taking the feature matrix as input, training the convolutional neural network, and specifically comprising the following steps:
the first layer is a convolution pooling layer, 96 convolution kernels with 11 multiplied by 11 dimensions are selected to carry out convolution operation on input data, the convolution step length is 3, signal characteristics are enhanced through the convolution operation, and noise is reduced. After convolution, 96 feature maps of 40 × 40 dimensions are generated.
The second layer is a pooling layer, and the feature maps generated by the first layer of the convolutional layers are pooled by a step length of 3 by using a 4 x 4-dimensional convolutional kernel to generate 96 13 x 13-dimensional feature maps;
the third layer is a convolution layer, 256 5 x 5-dimensional convolution cores are selected to perform convolution on the feature map generated by the second layer, and the feature map is prevented from shrinking in the convolution process by adopting an edge expansion and grouping mode. Generating 256 characteristic graphs with 13 multiplied by 13 dimensions after nonlinear transformation;
the fourth layer is a convolution layer, 384 5 × 5 dimensional convolution cores are selected to perform convolution on the feature maps generated by the third layer, and 384 13 × 13 dimensional feature maps are generated after nonlinear transformation in the same way of expanding edges and grouping.
And the fifth layer is a convolutional layer, 256 5 × 5-dimensional convolutional kernels are selected, and 384 13 × 13-dimensional feature maps are generated after nonlinear mapping is performed on the generated feature maps in an edge expansion mode.
The sixth layer is a pooling layer, and the feature maps generated by the fifth layer are pooled by using a convolution kernel with 3 x 3 dimensions and with the step length of 2 to generate 256 feature maps with 6 x 6 dimensions;
the seventh, eighth and ninth layers are all connecting layers. The seventh layer is that the characteristic graph generated by the sixth layer is fully connected to 4096 nodes, the eighth layer is that the working weight of the nodes of the hidden layer is controlled by using a dropout method after the ReLU function nonlinear transformation is carried out on the nodes of the seventh layer, the dropout method randomly discards part of the hidden nodes during each training, the discarded nodes can be temporarily considered not to be part of the network structure, but the weight of the nodes is kept, and only part of parameters are selected for adjustment each time. The number of full connections of the eighth layer is 4096. The output nodes of the ninth layer full connection layer are 6, and the output is the softmax loss fused with the feature labels.
And 5: adjusting parameters of LSTM and CNN in the system by using a back propagation algorithm, selecting an optimal network model, and storing the parameters;
step 6: and sending the test set sample into an optimal network model, and carrying out emotion recognition on the test set sample by using the trained network.
The function J (θ) of the softmax loss of the convolutional neural network is defined as follows:
wherein x is(i)As an input vector, y(i)The emotion category corresponding to the input vector is i-1, 2, … q, and q corresponds to the number of voice samples; thetajJ is 1,2, … p, p is corresponding to the number of emotion categories, T is transposition, e is natural base number; 1 {. is an indication function, and when the median value of the brace is true, the value of the function is 1, otherwise, the value is 0; as the number of training samples increases, the value of the loss function decreases continuously, and the corresponding theta is equal to theta when the loss function is stablejThe parameters of the optimized network model.
Claims (3)
1. A speech emotion recognition method based on a long-time memory network and a convolutional neural network is characterized by comprising the following steps:
step A, preprocessing voice samples in a voice emotion database to enable each voice sample to be represented by a sequence with equal length, and accordingly a preprocessed voice sequence is obtained;
step B, constructing a speech emotion recognition system based on the long-time memory network LSTM and the convolutional neural network CNN, wherein the speech emotion recognition system comprises two basic modules: the long and short time memory network module and the convolutional neural network module;
step C, sequentially sending the preprocessed voice sequence into a voice emotion recognition system for multiple training, and adjusting parameters of LSTM and CNN by using a back propagation algorithm to obtain an optimized network model;
d, carrying out emotion classification on the newly input voice sequence by using the network model obtained by training in the step C, wherein the emotion classification is divided into six emotions, namely sadness, happiness, disgust, fear, startle and neutrality;
the long-time memory network module in the step B is specifically constructed by the following steps:
b1.1, setting the length of the speech sample sequence as m, where m is n × n, n is a positive integer, and setting the outputs of the forgetting gate unit and the input gate unit at the current time as ftAnd itAnd satisfies the following conditions:
ft=σ(Wf·xc+bf)
it=σ(Wi·xc+bi)
wherein x isc=[ht-1,xt]New vector xcIs to make two ht-1、xtObtained by joining vectors end to end, xtFor input at the current time, ht-1For hiding the state of the layer at the previous moment, xcFor the concatenated new vector, WfAnd WiWeight matrices for the forgetting gate unit and the input gate unit, respectively, bfAnd biBias vectors of a forgetting gate unit and an input gate unit respectively, wherein sigma (·) is a sigmoid excitation function;
b1.2, calculating to obtain the current cell state C by the following formulatThe value of (c):
wherein, Ct-1The state of the cells at the previous moment, is a reference value of the cell state at the present time, WcIs a weight matrix of cell states, bcTan h (-) is a hyperbolic tangent function as a bias vector for the cell state;
b1.3, obtained according to the formulaOutput h of each hidden nodetH is to betSequentially connecting to form m-dimensional characteristic vectors;
ht=ot*tanh(Ct)
ot=σ(Wo·[ht-1,xt]+bo)
wherein, WoAs a weight matrix of the output gate unit, boIs an offset vector of the output gate unit, otIs the output of the output gate unit;
the convolutional neural network module in the step B is specifically constructed by the following steps:
b2.1, converting the m-dimensional feature vector extracted in the step B1.3 into an n multiplied by n feature matrix as the input of the convolutional neural network;
b2.2, selecting m as convolution layer as first layer of convolution neural network1K is1×k1Performing convolution operation on input data by using dimensional convolution kernel with convolution step length of s1The result after convolution is non-linearly mapped by an excitation function to obtain the output m of the convolution layer1An1×l1A feature map of the dimension;
the second layer of the B2.3 convolutional neural network is a pooling layer, and m is selected2K is2×k2Performing dimensional convolution kernel with step length s on the feature map output by the first layer of convolution layer2Pooling to obtain the output m of the pooling layer2An2×l2A feature map of the dimension;
b2.4, selecting m as convolution layer as the third layer of the convolution neural network3K is3×k3Performing convolution operation on the feature map output by the second pooling layer by using a dimensional convolution kernel, and performing nonlinear mapping on the result after convolution by using an excitation function to obtain the output m of the convolution layer3An3×l3A feature map of the dimension;
b2.5, selecting m as convolution layer as the fourth layer of the convolution neural network4K is4×k4Performing convolution operation on the feature graph output by the third layer of convolution layer by using a dimensional convolution kernel, and performing convolution operation on the result after convolution by using an excitation functionPerforming nonlinear mapping to obtain the output m of the convolutional layer4An4×l4A feature map of the dimension;
b2.6, selecting m as convolution layer for the fifth layer of the convolution neural network5K is5×k5Performing convolution operation on the feature graph output by the fourth layer of convolution layer by using a dimensional convolution kernel, and performing nonlinear mapping on the result after convolution by using an excitation function to obtain the output m of the convolution layer5An5×l5A feature map of the dimension;
b2.7, selecting m as a pooling layer as a sixth layer of the convolutional neural network6K is6×k6The convolution kernel of dimension carries out step length s on the characteristic diagram output by the fifth layer convolution layer6Pooling to obtain the output m of the pooling layer6An6×l6A feature map of the dimension;
b2.8, the seventh layer, the eighth layer and the ninth layer of the convolutional neural network are all connected layers; wherein, the seventh layer is to connect the feature map output by the sixth layer pooling layer to c nodes of the layer; the eighth layer carries out ReLU function nonlinear transformation on c nodes of the seventh layer, and then the connection weight of the nodes of the hidden layer is controlled by using a dropout method, wherein the total connection number is c; the output nodes of the ninth full-connection layer are p, and the output is softmax loss fused with the feature tags.
2. The emotion speech recognition method based on the long-and-short-term memory network and the convolutional neural network as claimed in claim 1, wherein the function J (θ) of softmax loss of the convolutional neural network in step B is defined as follows:
wherein x is(i)As an input vector, y(i)The emotion category corresponding to the input vector is i-1, 2, … q, and q corresponds to the number of voice samples; thetajJ is 1,2, … p, p is corresponding to the number of emotion categories, T is transposition, e is natural base number; 1 {. is an indication function, in parenthesisWhen the value is true, the function takes a value of 1, otherwise it is 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611093447.3A CN106782602B (en) | 2016-12-01 | 2016-12-01 | Speech emotion recognition method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611093447.3A CN106782602B (en) | 2016-12-01 | 2016-12-01 | Speech emotion recognition method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106782602A CN106782602A (en) | 2017-05-31 |
CN106782602B true CN106782602B (en) | 2020-03-17 |
Family
ID=58913860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611093447.3A Active CN106782602B (en) | 2016-12-01 | 2016-12-01 | Speech emotion recognition method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106782602B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2816680C1 (en) * | 2023-03-31 | 2024-04-03 | Автономная некоммерческая организация высшего образования "Университет Иннополис" | Method of recognizing speech emotions using 3d convolutional neural network |
Families Citing this family (90)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018227169A1 (en) * | 2017-06-08 | 2018-12-13 | Newvoicemedia Us Inc. | Optimal human-machine conversations using emotion-enhanced natural speech |
CN107293288B (en) * | 2017-06-09 | 2020-04-21 | 清华大学 | Acoustic model modeling method of residual long-short term memory recurrent neural network |
CN107633842B (en) * | 2017-06-12 | 2018-08-31 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN107392109A (en) * | 2017-06-27 | 2017-11-24 | 南京邮电大学 | A kind of neonatal pain expression recognition method based on deep neural network |
CN107274378B (en) * | 2017-07-25 | 2020-04-03 | 江西理工大学 | Image fuzzy type identification and parameter setting method based on fusion memory CNN |
CN107562784A (en) * | 2017-07-25 | 2018-01-09 | 同济大学 | Short text classification method based on ResLCNN models |
CN107562792B (en) * | 2017-07-31 | 2020-01-31 | 同济大学 | question-answer matching method based on deep learning |
CN107293290A (en) * | 2017-07-31 | 2017-10-24 | 郑州云海信息技术有限公司 | The method and apparatus for setting up Speech acoustics model |
CN107506414B (en) * | 2017-08-11 | 2020-01-07 | 武汉大学 | Code recommendation method based on long-term and short-term memory network |
CN108346436B (en) | 2017-08-22 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Voice emotion detection method and device, computer equipment and storage medium |
CN107705807B (en) * | 2017-08-24 | 2019-08-27 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
CN109426858B (en) * | 2017-08-29 | 2021-04-06 | 京东方科技集团股份有限公司 | Neural network, training method, image processing method, and image processing apparatus |
CN107785011B (en) * | 2017-09-15 | 2020-07-03 | 北京理工大学 | Training method, device, equipment and medium of speech rate estimation model and speech rate estimation method, device and equipment |
CN107679557B (en) * | 2017-09-19 | 2020-11-27 | 平安科技(深圳)有限公司 | Driving model training method, driver identification method, device, equipment and medium |
CN107464568B (en) * | 2017-09-25 | 2020-06-30 | 四川长虹电器股份有限公司 | Speaker identification method and system based on three-dimensional convolution neural network text independence |
CN107679199A (en) * | 2017-10-11 | 2018-02-09 | 北京邮电大学 | A kind of external the Chinese text readability analysis method based on depth local feature |
CN107703564B (en) * | 2017-10-13 | 2020-04-14 | 中国科学院深圳先进技术研究院 | Rainfall prediction method and system and electronic equipment |
CN107818307B (en) * | 2017-10-31 | 2021-05-18 | 天津大学 | Multi-label video event detection method based on LSTM network |
CN107862331A (en) * | 2017-10-31 | 2018-03-30 | 华中科技大学 | It is a kind of based on time series and CNN unsafe acts recognition methods and system |
CN109754790B (en) * | 2017-11-01 | 2020-11-06 | 中国科学院声学研究所 | Speech recognition system and method based on hybrid acoustic model |
CN108039181B (en) * | 2017-11-02 | 2021-02-12 | 北京捷通华声科技股份有限公司 | Method and device for analyzing emotion information of sound signal |
CN107885853A (en) * | 2017-11-14 | 2018-04-06 | 同济大学 | A kind of combined type file classification method based on deep learning |
CN107992938B (en) * | 2017-11-24 | 2019-05-14 | 清华大学 | Space-time big data prediction technique and system based on positive and negative convolutional neural networks |
CN108280406A (en) * | 2017-12-30 | 2018-07-13 | 广州海昇计算机科技有限公司 | A kind of Activity recognition method, system and device based on segmentation double-stream digestion |
CN108597539B (en) * | 2018-02-09 | 2021-09-03 | 桂林电子科技大学 | Speech emotion recognition method based on parameter migration and spectrogram |
CN108304823B (en) * | 2018-02-24 | 2022-03-22 | 重庆邮电大学 | Expression recognition method based on double-convolution CNN and long-and-short-term memory network |
CN108520753B (en) * | 2018-02-26 | 2020-07-24 | 南京工程学院 | Voice lie detection method based on convolution bidirectional long-time and short-time memory network |
CN108550375A (en) * | 2018-03-14 | 2018-09-18 | 鲁东大学 | A kind of emotion identification method, device and computer equipment based on voice signal |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
CN108564954B (en) * | 2018-03-19 | 2020-01-10 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, identity verification method, and storage medium |
CN108831450A (en) * | 2018-03-30 | 2018-11-16 | 杭州鸟瞰智能科技股份有限公司 | A kind of virtual robot man-machine interaction method based on user emotion identification |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108564942B (en) * | 2018-04-04 | 2021-01-26 | 南京师范大学 | Voice emotion recognition method and system based on adjustable sensitivity |
CN108766419B (en) * | 2018-05-04 | 2020-10-27 | 华南理工大学 | Abnormal voice distinguishing method based on deep learning |
CN108806667B (en) * | 2018-05-29 | 2020-04-17 | 重庆大学 | Synchronous recognition method of voice and emotion based on neural network |
CN110179453B (en) * | 2018-06-01 | 2020-01-03 | 山东省计算中心(国家超级计算济南中心) | Electrocardiogram classification method based on convolutional neural network and long-short term memory network |
CN108961072A (en) * | 2018-06-07 | 2018-12-07 | 平安科技(深圳)有限公司 | Push method, apparatus, computer equipment and the storage medium of insurance products |
CN108717856B (en) * | 2018-06-16 | 2022-03-08 | 台州学院 | Speech emotion recognition method based on multi-scale deep convolution cyclic neural network |
CN108922617B (en) * | 2018-06-26 | 2021-10-26 | 电子科技大学 | Autism auxiliary diagnosis method based on neural network |
CN108630199A (en) * | 2018-06-30 | 2018-10-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of data processing method of acoustic model |
CN109034034A (en) * | 2018-07-12 | 2018-12-18 | 广州麦仑信息科技有限公司 | A kind of vein identification method based on nitrification enhancement optimization convolutional neural networks |
CN109003625B (en) * | 2018-07-27 | 2021-01-12 | 中国科学院自动化研究所 | Speech emotion recognition method and system based on ternary loss |
CN109190514B (en) * | 2018-08-14 | 2021-10-01 | 电子科技大学 | Face attribute recognition method and system based on bidirectional long-short term memory network |
CN109147826B (en) * | 2018-08-22 | 2022-12-27 | 平安科技(深圳)有限公司 | Music emotion recognition method and device, computer equipment and computer storage medium |
CN109036459B (en) * | 2018-08-22 | 2019-12-27 | 百度在线网络技术(北京)有限公司 | Voice endpoint detection method and device, computer equipment and computer storage medium |
CN109087635A (en) * | 2018-08-30 | 2018-12-25 | 湖北工业大学 | A kind of speech-sound intelligent classification method and system |
CN109285562B (en) * | 2018-09-28 | 2022-09-23 | 东南大学 | Voice emotion recognition method based on attention mechanism |
CN109346107B (en) * | 2018-10-10 | 2022-09-30 | 中山大学 | LSTM-based method for inversely solving pronunciation of independent speaker |
CN109243490A (en) * | 2018-10-11 | 2019-01-18 | 平安科技(深圳)有限公司 | Driver's Emotion identification method and terminal device |
CN109389992A (en) * | 2018-10-18 | 2019-02-26 | 天津大学 | A kind of speech-emotion recognition method based on amplitude and phase information |
CN109282837B (en) * | 2018-10-24 | 2021-06-22 | 福州大学 | Demodulation method of Bragg fiber grating staggered spectrum based on LSTM network |
CN109036467B (en) * | 2018-10-26 | 2021-04-16 | 南京邮电大学 | TF-LSTM-based CFFD extraction method, voice emotion recognition method and system |
CN109243493B (en) * | 2018-10-30 | 2022-09-16 | 南京工程学院 | Infant crying emotion recognition method based on improved long-time and short-time memory network |
CN109243494B (en) * | 2018-10-30 | 2022-10-11 | 南京工程学院 | Children emotion recognition method based on multi-attention mechanism long-time memory network |
CN109146066A (en) * | 2018-11-01 | 2019-01-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition |
CN109567793B (en) * | 2018-11-16 | 2021-11-23 | 西北工业大学 | Arrhythmia classification-oriented ECG signal processing method |
CN111222624B (en) * | 2018-11-26 | 2022-04-29 | 深圳云天励飞技术股份有限公司 | Parallel computing method and device |
CN110096587B (en) * | 2019-01-11 | 2020-07-07 | 杭州电子科技大学 | Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model |
CN109637545B (en) * | 2019-01-17 | 2023-05-30 | 哈尔滨工程大学 | Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network |
JP6580281B1 (en) * | 2019-02-20 | 2019-09-25 | ソフトバンク株式会社 | Translation apparatus, translation method, and translation program |
CN110322900A (en) * | 2019-06-25 | 2019-10-11 | 深圳市壹鸽科技有限公司 | A kind of method of phonic signal character fusion |
CN110363751B (en) * | 2019-07-01 | 2021-08-03 | 浙江大学 | Large intestine endoscope polyp detection method based on generation cooperative network |
CN112446266B (en) * | 2019-09-04 | 2024-03-29 | 北京君正集成电路股份有限公司 | Face recognition network structure suitable for front end |
CN110600018B (en) * | 2019-09-05 | 2022-04-26 | 腾讯科技(深圳)有限公司 | Voice recognition method and device and neural network training method and device |
CN110738852B (en) * | 2019-10-23 | 2020-12-18 | 浙江大学 | Intersection steering overflow detection method based on vehicle track and long and short memory neural network |
CN110929762B (en) * | 2019-10-30 | 2023-05-12 | 中科南京人工智能创新研究院 | Limb language detection and behavior analysis method and system based on deep learning |
CN112766292A (en) * | 2019-11-04 | 2021-05-07 | 中移(上海)信息通信科技有限公司 | Identity authentication method, device, equipment and storage medium |
CN112819133A (en) * | 2019-11-15 | 2021-05-18 | 北方工业大学 | Construction method of deep hybrid neural network emotion recognition model |
CN110956953B (en) * | 2019-11-29 | 2023-03-10 | 中山大学 | Quarrel recognition method based on audio analysis and deep learning |
CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | Hybrid neural network vehicle type identification method based on audio feature fusion |
CN111179910A (en) * | 2019-12-17 | 2020-05-19 | 深圳追一科技有限公司 | Speed of speech recognition method and apparatus, server, computer readable storage medium |
WO2021127982A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, smart device, and computer-readable storage medium |
CN111241817A (en) * | 2020-01-20 | 2020-06-05 | 首都医科大学 | Text-based depression identification method |
CN111210844B (en) * | 2020-02-03 | 2023-03-24 | 北京达佳互联信息技术有限公司 | Method, device and equipment for determining speech emotion recognition model and storage medium |
CN111310672A (en) * | 2020-02-19 | 2020-06-19 | 广州数锐智能科技有限公司 | Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling |
CN111248882B (en) * | 2020-02-21 | 2022-07-29 | 乐普(北京)医疗器械股份有限公司 | Method and device for predicting blood pressure |
CN111326178A (en) * | 2020-02-27 | 2020-06-23 | 长沙理工大学 | Multi-mode speech emotion recognition system and method based on convolutional neural network |
CN111524535B (en) * | 2020-04-30 | 2022-06-21 | 杭州电子科技大学 | Feature fusion method for speech emotion recognition based on attention mechanism |
CN111709284B (en) * | 2020-05-07 | 2023-05-30 | 西安理工大学 | Dance emotion recognition method based on CNN-LSTM |
CN112383369A (en) * | 2020-07-23 | 2021-02-19 | 哈尔滨工业大学 | Cognitive radio multi-channel spectrum sensing method based on CNN-LSTM network model |
CN112037822B (en) * | 2020-07-30 | 2022-09-27 | 华南师范大学 | Voice emotion recognition method based on ICNN and Bi-LSTM |
CN112101095B (en) * | 2020-08-02 | 2023-08-29 | 华南理工大学 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
CN112001482B (en) * | 2020-08-14 | 2024-05-24 | 佳都科技集团股份有限公司 | Vibration prediction and model training method, device, computer equipment and storage medium |
CN112187413B (en) * | 2020-08-28 | 2022-05-03 | 中国人民解放军海军航空大学航空作战勤务学院 | SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology) |
CN112259126B (en) * | 2020-09-24 | 2023-06-20 | 广州大学 | Robot and method for assisting in identifying autism voice features |
CN112735434A (en) * | 2020-12-09 | 2021-04-30 | 中国人民解放军陆军工程大学 | Voice communication method and system with voiceprint cloning function |
CN112735479B (en) * | 2021-03-31 | 2021-07-06 | 南方电网数字电网研究院有限公司 | Speech emotion recognition method and device, computer equipment and storage medium |
CN113221758B (en) * | 2021-05-16 | 2023-07-14 | 西北工业大学 | GRU-NIN model-based underwater sound target identification method |
CN113255800B (en) * | 2021-06-02 | 2021-10-15 | 中国科学院自动化研究所 | Robust emotion modeling system based on audio and video |
CN114305418B (en) * | 2021-12-16 | 2023-08-04 | 广东工业大学 | Data acquisition system and method for intelligent assessment of depression state |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10783900B2 (en) * | 2014-10-03 | 2020-09-22 | Google Llc | Convolutional, long short-term memory, fully connected deep neural networks |
CN105279495B (en) * | 2015-10-23 | 2019-06-04 | 天津大学 | A kind of video presentation method summarized based on deep learning and text |
CN105469065B (en) * | 2015-12-07 | 2019-04-23 | 中国科学院自动化研究所 | A kind of discrete emotion identification method based on recurrent neural network |
CN105844239B (en) * | 2016-03-23 | 2019-03-29 | 北京邮电大学 | It is a kind of that video detecting method is feared based on CNN and LSTM cruelly |
CN106096568B (en) * | 2016-06-21 | 2019-06-11 | 同济大学 | A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network |
-
2016
- 2016-12-01 CN CN201611093447.3A patent/CN106782602B/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2816680C1 (en) * | 2023-03-31 | 2024-04-03 | Автономная некоммерческая организация высшего образования "Университет Иннополис" | Method of recognizing speech emotions using 3d convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN106782602A (en) | 2017-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782602B (en) | Speech emotion recognition method based on deep neural network | |
CN108615010B (en) | Facial expression recognition method based on parallel convolution neural network feature map fusion | |
Ren et al. | Deep scalogram representations for acoustic scene classification | |
CN109036465B (en) | Speech emotion recognition method | |
CN112784798A (en) | Multi-modal emotion recognition method based on feature-time attention mechanism | |
CN109472024A (en) | A kind of file classification method based on bidirectional circulating attention neural network | |
CN112466326B (en) | Voice emotion feature extraction method based on transducer model encoder | |
CN110675859B (en) | Multi-emotion recognition method, system, medium, and apparatus combining speech and text | |
CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
CN107393554A (en) | In a kind of sound scene classification merge class between standard deviation feature extracting method | |
CN103544963A (en) | Voice emotion recognition method based on core semi-supervised discrimination and analysis | |
CN108962247B (en) | Multi-dimensional voice information recognition system and method based on progressive neural network | |
CN110367967A (en) | A kind of pocket lightweight human brain condition detection method based on data fusion | |
CN109410974A (en) | Sound enhancement method, device, equipment and storage medium | |
Han et al. | Speech emotion recognition with a resnet-cnn-transformer parallel neural network | |
CN112151071B (en) | Speech emotion recognition method based on mixed wavelet packet feature deep learning | |
CN110534133A (en) | A kind of speech emotion recognition system and speech-emotion recognition method | |
CN111461201A (en) | Sensor data classification method based on phase space reconstruction | |
Ocquaye et al. | Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition | |
CN115565540B (en) | Invasive brain-computer interface Chinese pronunciation decoding method | |
CN110992988A (en) | Speech emotion recognition method and device based on domain confrontation | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
CN112397092A (en) | Unsupervised cross-library speech emotion recognition method based on field adaptive subspace | |
CN112668486A (en) | Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network | |
Adiga et al. | Multimodal emotion recognition for human robot interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210023 Applicant after: Nanjing Post & Telecommunication Univ. Address before: 210003, No. 66, new exemplary Road, Nanjing, Jiangsu Applicant before: Nanjing Post & Telecommunication Univ. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |