CN106782602A - Speech-emotion recognition method based on length time memory network and convolutional neural networks - Google Patents
Speech-emotion recognition method based on length time memory network and convolutional neural networks Download PDFInfo
- Publication number
- CN106782602A CN106782602A CN201611093447.3A CN201611093447A CN106782602A CN 106782602 A CN106782602 A CN 106782602A CN 201611093447 A CN201611093447 A CN 201611093447A CN 106782602 A CN106782602 A CN 106782602A
- Authority
- CN
- China
- Prior art keywords
- layer
- neural networks
- convolutional neural
- output
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000008909 emotion recognition Effects 0.000 claims abstract description 32
- 230000002996 emotional effect Effects 0.000 claims abstract description 15
- 230000008451 emotion Effects 0.000 claims abstract description 7
- 230000007935 neutral effect Effects 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 35
- 230000005284 excitation Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 230000017105 transposition Effects 0.000 claims description 3
- 230000001413 cellular effect Effects 0.000 claims description 2
- 238000000844 transformation Methods 0.000 claims description 2
- 238000009394 selective breeding Methods 0.000 abstract 1
- 238000003475 lamination Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000012913 prioritisation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010297 mechanical methods and process Methods 0.000 description 1
- 230000005226 mechanical processes and functions Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Psychiatry (AREA)
- Child & Adolescent Psychology (AREA)
- Hospice & Palliative Care (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of speech-emotion recognition method based on length time memory network and convolutional neural networks, the method builds the speech emotion recognition system based on LSTM and CNN, using voice sequence as system input, LSTM and CNN are trained using back-propagation algorithm, optimize the parameter of network, the network model after being optimized;Emotional semantic classification is carried out to the voice sequence of new input using the network model for having trained, is divided into sad, glad, detest, frightened, scaring, neutral six kinds of emotions.The method considered two kinds of network models of LSTM and CNN, it is to avoid artificial selection and extract the cumbersome of feature, improves the accuracy rate of emotion recognition.
Description
Technical field
The present invention relates to image procossing and area of pattern recognition, length time memory network and convolutional Neural are based particularly on
The speech-emotion recognition method of network.
Background technology
In human communication, there is the side exchanged including the much information including voice, body language, facial expression etc.
Formula.Wherein, voice signal is the most quick, exchange way of most original, and studied scholar is considered to realize man-machine interaction most
One of effective method.Since nearly half a century, scholars have studied the substantial amounts of problem on speech recognition, i.e., how by language
Sound sequence is converted to text.Although being made substantial progress in terms of speech recognition, because machine cannot understand speaker's
Affective state, causes realizing being had got long long way to go in terms of the mankind and machine are naturally interacted.This has also driven another
The research of aspect, is how to identify the affective state of speaker, i.e. speech emotion recognition from voice.
Speech emotion recognition as man-machine interaction an important branch, can be extensively using with education, medical treatment, traffic etc.
Every field.In onboard system, can serve as being monitored the state of mind of driver, judge it whether in safe shape
State, so as to be reminded when driver fatigue, it is to avoid the generation of traffic accident;In telephone service, can serve as
The user fierce to words expression sorts, and is transferred to artificial customer service, optimizes Consumer's Experience, improves whole service water
It is flat;In clinical medicine, the emotion change by speech emotion recognition to patients with depression or autism children is tracked,
As medical diagnosis on disease and the instrument of auxiliary treatment;In robot research, help robot that the sense of people is understood using voice messaging
Feelings, make the response of friendly and intelligence, realize interaction.
The traditional extraction feature of the method for most of speech emotion recognition use at this stage, then classified with grader
Method.Conventional phonetic feature includes fundamental tone, word speed, intensity (prosodic features), linear prediction residue error, mel-frequency cepstrum
Coefficient (spectrum signature) etc..Conventional sorting technique includes hidden Markov model, SVMs, gauss hybrid models.Pass
The emotion identification method of system has tended to ripe, but still suffers from certain deficiency.Such as, which kind of feature pair is still not clear at present
The influence of emotion recognition is maximum, only from a kind of feature as the foundation for judging in most of experiments, reduces emotion recognition
Objectivity.In addition, in existing some features, such as the prosodic features such as fundamental tone, word speed is influenceed larger by the style of speaker, increase
The complexity of identification is added.
With the development of nearly stage deep learning, many researcher's selections complete emotion knowledge using training network model
Not.In existing speech-emotion recognition method, mainly there is the speech-emotion recognition method based on depth belief network, based on length
The speech-emotion recognition method of time memory network and the speech-emotion recognition method based on convolutional neural networks.Above-mentioned three kinds of sides
In method, the major defect for existing is:The advantage of each network model cannot be taken into account.Such as, depth belief network can will be one-dimensional
Sequence is used as to be input into, but the correlation between sequence cannot be utilized front and rear;Although length time memory network can utilize sequence
Correlation between front and rear, but the intrinsic dimensionality for extracting is higher;Convolutional neural networks cannot be processed directly voice sequence,
Need first to carry out Fourier transformation to voice signal, be converted into conduct input after frequency spectrum.Traditional speech-emotion recognition method
It is small in feature extraction and development of categories prospect, and the existing speech emotional method network based on deep learning is more single.
The content of the invention
The technical problems to be solved by the invention are to overcome the deficiencies in the prior art, and provide and be based on length time memory net
The speech-emotion recognition method of network and convolutional neural networks, it is to avoid the complex process of artificial extraction and screening feature, by instruction
Practice the adjusting parameter of network self-adapting, obtain optimal emotion recognition effect.
The present invention uses following technical scheme to solve above-mentioned technical problem:
According to a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention
Method, comprises the following steps:
Step A, pretreatment operation is carried out to the speech samples in speech emotional database so that the equal energy of each speech samples
Represented with an isometric sequence, so as to obtain pretreated voice sequence;
The speech emotion recognition system of step B, structure based on length time memory network LSTM and convolutional neural networks CNN
System, it includes two basic modules:Length time memory mixed-media network modules mixed-media and convolutional neural networks module;
Step C, pretreated voice sequence is sequentially sent to speech emotion recognition system is repeatedly trained, using anti-
The parameter of LSTM and CNN, the network model after being optimized are adjusted to propagation algorithm;
Step D, the network model obtained using step C training carry out emotional semantic classification to the voice sequence of new input, are divided into
Sad, glad, detest, frightened, scaring, neutral six kinds of emotions.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention
The further prioritization scheme of method, the length time memory mixed-media network modules mixed-media in the step B, specific construction step is as follows:
B1.1, the length for setting speech samples sequence are m, and m=n × n, n are positive integer, and setting current time forgets door list
The output of unit and input gate cell is respectively ftAnd it, meet:
ft=σ (Wf·xc+bf)
it=σ (Wi·xc+bi)
Wherein, xc=[ht-1,xt], new vector xcIt is by two ht-1、xtVector joins end to end what is obtained, xtIt is current time
Input, ht-1It is the state of previous moment hidden layer, xcNew vector after for connection, WfAnd WiRespectively forget gate cell and defeated
Enter the weight matrix of gate cell, bfAnd biRespectively forget the bias vector of gate cell and input gate cell, σ () is sigmoid
Excitation function;
B1.2, current cell state C is calculated by following formulatValue:
Wherein, Ct-1It is previous moment cell state, For current time is thin
The reference value of born of the same parents' state, WCIt is the weight matrix of cell state, bCBe the bias vector of cell state, tanh () be hyperbolic just
Cut function;
B1.3, the output h that each concealed nodes is obtained according to following formulat, by htIt is sequentially connected, constitutes the characteristic vector of m dimensions;
ht=ot*tanh(Ct)
ot=σ (Wo·[ht-1,xt]+bo)
Wherein, WoTo export the weight matrix of gate cell, boTo export the bias vector of gate cell, otIt is output gate cell
Output.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention
The further prioritization scheme of method, the convolutional neural networks module in the step B, specific construction step is as follows:
B2.1, the characteristic vector of the m dimensions that will be extracted in step B1.3 are converted to the eigenmatrix of n × n as convolutional Neural
The input of network;
B2.2, the ground floor of convolutional neural networks are convolutional layer, from m1Individual k1×k1The convolution kernel of dimension enters to input data
Row convolution algorithm, convolution step-length is s1, the result after convolution carries out Nonlinear Mapping, obtains convolutional layer by excitation function again
Output is m1Individual l1×l1The characteristic pattern of dimension;
The second layer of B2.3 convolutional neural networks is pond layer, from m2Individual k2×k2The convolution kernel of dimension, to ground floor convolution
The characteristic pattern of layer output carries out step-length for s2Pond, obtain the output i.e. m of pond layer2Individual l2×l2The characteristic pattern of dimension;
B2.4, the third layer of convolutional neural networks are convolutional layer, from m3Individual k3×k3The convolution kernel of dimension, to second layer pond
The characteristic pattern for changing layer output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, is rolled up
The output of lamination is m3Individual l3×l3The characteristic pattern of dimension;
B2.5, the 4th layer of convolutional neural networks be convolutional layer, from m4Individual k4×k4The convolution kernel of dimension, rolls up to third layer
The characteristic pattern of lamination output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, is rolled up
The output of lamination is m4Individual l4×l4The characteristic pattern of dimension;
B2.6, the layer 5 of convolutional neural networks are convolutional layer, from m5Individual k5×k5The convolution kernel of dimension, rolls up to the 4th layer
The characteristic pattern of lamination output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, is rolled up
The output of lamination is m5Individual l5×l5The characteristic pattern of dimension;
B2.7, the layer 6 of convolutional neural networks are pond layer, from m6Individual k6×k6The convolution kernel of dimension, rolls up to layer 5
The characteristic pattern of lamination output carries out step-length for s6Pond, obtain the output i.e. m of pond layer6Individual l6×l6The characteristic pattern of dimension;
B2.8, the seven, the eight of convolutional neural networks, nine layer are full articulamentum;Wherein, layer 7 is by layer 6 pond layer
The characteristic pattern of output is connected to the c node of this layer entirely;The 8th layer of c node to layer 7 carries out the non-linear change of ReLU functions
After changing, the connection weight of node layer is then hidden using the control of dropout methods, full linking number is c;9th layer of full connection
Layer output node is p, is output as having merged the softmax losses of feature tag.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention
The further prioritization scheme of method, the function J (θ) of the softmax losses of the convolutional neural networks in the step B is defined as follows:
Wherein, x(i)It is input vector, y(i)It is the corresponding emotional category of input vector, i=1,2 ... q, q corresponds to voice
Sample number;θjIt is parameter to be trained, j=1,2 ... p, p corresponds to emotional category number, and T is transposition, and e is the nature truth of a matter;1
{ } is indicator function, and when braces intermediate value is true, function value is 1, otherwise is 0.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention
The further prioritization scheme of method, tanh function representations areSigmoid function representations areWherein, x is variable.
The present invention uses above technical scheme compared with prior art, with following technique effect:
(1) the artificial complex process extracted and screen feature is avoided, by the adjusting parameter of training network self adaptation, is obtained
Obtain emotion recognition effect most preferably;
(2) speech-emotion recognition method based on LSTM and CNN, two kinds of different network models are merged, and are borrowed
Helping LSTM directly can be processed voice sequence, while the correlation between before and after can utilizing sequence time;By CNN
The interference of noise is reduced, while more abstract feature can be learnt, the accuracy and robustness of emotion recognition is improved.
Brief description of the drawings
Fig. 1 is the flow chart of the speech-emotion recognition method based on LSTM and CNN of the invention.
Fig. 2 is the basic frame structure figure of the speech emotion recognition system based on LSTM and CNN for building.
Fig. 3 is the basic framework figure of length time memory mixed-media network modules mixed-media in speech emotion recognition system.
Fig. 4 is the basic framework figure of convolutional neural networks module in speech emotion recognition system.
Specific embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings and the specific embodiments
The present invention will be described in detail.
It is of the invention to be based on if Fig. 1 is the flow chart of the speech-emotion recognition method based on LSTM and CNN of the invention
The realization of the speech-emotion recognition method of LSTM and CNN is mainly comprised the steps of:
Step 1:Suitable speech emotional database is selected, sound bite therein is gathered;
In actual mechanical process, AFEW databases, the database is selected to provide original video segment, these piece of video
Duan Jun is taken from cinematographic work.Compared with conventional laboratory data base, the voice and emotional expression in AFEW databases more connect
Real-life situation is bordering on, with more generality.The sample age was dispersed between 1 to 70 years old, covered all age group, wherein
The sample of a large amount of Children and teenagers is contained, subsequently can be used for the emotion recognition of young main body.Sample in database
Originally it is divided into six classes, respectively sadness, glad, detest, frightened, scaring, neutrality, is marked with 1~6.In selecting video
Used as sample set, sample frequency is 48kHz to sound bite.
Step 2:Read voice sample data, unified samples sequence length;
Due to the duration difference of speech samples, while being concentrated mainly on voice sequence in view of useful information
Zone line, chooses 16384 sampled points near each voice sequence intermediate point and represents whole voice in practice.According to 7:3
Ratio arbitrarily choose speech samples respectively as training set, checking collection.The voice sequence of each sample set and label are stored
It is pkl files.
Step 3:Speech emotion recognition system is built, voice sequence as input is carried out to length time memory network
Training, obtains the output of hidden layer;Fig. 2 is the basic framework knot of the speech emotion recognition system based on LSTM and CNN for building
Composition, illustrates that speech samples complete the overall process of emotional semantic classification, and system mainly contains two basic modules of LSTM and CNN;
Fig. 3 is the basic framework figure of length time memory mixed-media network modules mixed-media in speech emotion recognition system, illustrates LSTM NEs
Internal structure, reflects contacting between hiding layer state and each gate cell;Fig. 4 is convolution god in speech emotion recognition system
Through the basic framework figure of mixed-media network modules mixed-media, illustrate that eigenmatrix is generated after convolution, pond and full attended operation and believe containing label
The vectorial process of breath;
Use x0,x1,x2,…,xt... represent the voice sequence of input, h0,h1,h2,…,ht... represent each concealed nodes
State.xc=[ht-1,xt], the input at the state of previous moment hidden layer and current time is connected as a vector by expression
xc.The output that setting t forgets gate cell and input gate cell is respectively ftAnd it, calculate ftAnd itValue it is as follows:
ft=σ (Wf·xc+bf) (1)
it=σ (Wi·xc+bi) (2)
The value of cell state is calculated by following formula:
The output of mixed-media network modules mixed-media determines by current cell state, be after filtering after cell value.First will be cellular
State ensures its output area between -1 to 1, by the output of this layer by a sigmoid unit followed by tanh functions
Value otProduct is done, the output h of hidden layer is determinedt:
ot=σ (Wo·[ht-1,xt]+bo) (4)
ht=ot*tanh(Ct) (5)
Obtain the output h of each concealed nodestIt is sequentially connected afterwards, constitutes a length of 16384 characteristic vector, and by this feature
Vector is converted to 128 × 128 eigenmatrix.
Step 4:Using eigenmatrix as input, convolutional neural networks are trained, comprised the following steps that:
Ground floor is convolution pond layer, and the convolution kernel from 96 11 × 11 dimensions carries out convolution algorithm to input data, rolls up
Product step-length is 3, and signal characteristic is strengthened by convolution algorithm, reduces noise.96 40 × 40 characteristic patterns of dimension are generated after convolution.
The second layer is pond layer, and using the convolution kernel of 4 × 4 dimensions, the characteristic pattern to the generation of ground floor convolutional layer carries out step-length
It is 3 pond, generates 96 13 × 13 characteristic patterns of dimension;
Third layer is convolutional layer, and the convolution kernel from 256 5 × 5 dimensions carries out convolution to the characteristic pattern that the second layer is generated, and adopts
With the mode for expanding edge bonus point group, feature graph embedding in convolution process is prevented.After nonlinear transformation, 256 13 are generated
The characteristic pattern of × 13 dimensions;
4th layer is convolutional layer, and the convolution kernel from 384 5 × 5 dimensions carries out convolution to the characteristic pattern that third layer is generated, together
Sample after nonlinear transformation, generates 384 13 × 13 characteristic patterns of dimension by the way of edge bonus point group is expanded.
Layer 5 is convolutional layer, from 256 5 × 5 convolution kernels of dimension, while by the way of edge is expanded, will generate
Characteristic pattern by after Nonlinear Mapping, generate 384 13 × 13 characteristic patterns of dimension.
Layer 6 is pond layer, and using the convolution kernel of 3 × 3-dimensional, the characteristic pattern to the generation of layer 5 convolutional layer carries out step-length
It is 2 pond, generates 256 6 × 6 characteristic patterns of dimension;
Seven, the eight, nine layers is full articulamentum.Wherein layer 7 is that the complete of layer 6 generation characteristic pattern is connected into 4096
Node, the 8th layer is after ReLU function nonlinear transformations are carried out to layer 7 node, hidden layer section to be controlled using dropout methods
The work weight of point, dropout methods random drop part concealed nodes in training every time, the node being dropped temporarily can be with
Not think be network structure a part, but retain node weight, every time only be adjusted from partial parameters.8th layer
Full linking number be 4096.9th layer of full articulamentum output node is 6, is output as having merged feature tag
Softmax loses.
Step 5:Using the parameter of LSTM and CNN in back-propagation algorithm adjustment system, optimum network model is chosen, preserved
Its parameter;
Step 6:Test set sample is sent into optimal network model, emotion knowledge is carried out to it using the network for training
Not.
The function J (θ) of the softmax losses of convolutional neural networks is defined as follows:
Wherein, x(i)It is input vector, y(i)It is the corresponding emotional category of input vector, i=1,2 ... q, q corresponds to voice
Sample number;θjIt is parameter to be trained, j=1,2 ... p, p corresponds to emotional category number, and T is transposition, and e is the nature truth of a matter;1
{ } is indicator function, and when braces intermediate value is true, function value is 1, otherwise is 0;With the increase of number of training, loss
The value of function can be reduced constantly, corresponding θ when loss function tends towards stabilityjIt is then the parameter of network model after optimization.
Tanh functions (hyperbolic tangent function) in the present invention, are expressed asReLU function (modified lines
Property unit function), (0, x), sigmoid functions (S sigmoid growth curves) are expressed as to be expressed as f (x)=maxX is variable.
Claims (5)
1. a kind of speech-emotion recognition method based on length time memory network and convolutional neural networks, it is characterised in that bag
Include following steps:
Step A, pretreatment operation is carried out to the speech samples in speech emotional database so that each speech samples can use one
Individual isometric sequence is represented, so as to obtain pretreated voice sequence;
The speech emotion recognition system of step B, structure based on length time memory network LSTM and convolutional neural networks CNN, its
Comprising two basic modules:Length time memory mixed-media network modules mixed-media and convolutional neural networks module;
Step C, pretreated voice sequence is sequentially sent to speech emotion recognition system is repeatedly trained, using reversely biography
Broadcast the parameter that algorithm adjusts LSTM and CNN, the network model after being optimized;
Step D, emotional semantic classification is carried out to the voice sequence of new input using the step C network models that obtain of training, be divided into it is sad,
Glad, detest, frightened, scaring, neutral six kinds of emotions.
2. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 1
Method, it is characterised in that the length time memory mixed-media network modules mixed-media in the step B, specific construction step is as follows:
B1.1, the length for setting speech samples sequence are m, and m=n × n, n are positive integer, setting current time forget gate cell and
The output for being input into gate cell is respectively ftAnd it, meet:
ft=σ (Wf·xc+bf)
it=σ (Wi·xc+bi)
Wherein, xc=[ht-1,xt], new vector xcIt is by two ht-1、xtVector joins end to end what is obtained, xtIt is defeated for current time
Enter, ht-1It is the state of previous moment hidden layer, xcNew vector after for connection, WfAnd WiRespectively forget gate cell and input gate
The weight matrix of unit, bfAnd biRespectively forget the bias vector of gate cell and input gate cell, σ () is encouraged for sigmoid
Function;
B1.2, current cell state C is calculated by following formulatValue:
Wherein, Ct-1It is previous moment cell state, For current time is cellular
The reference value of state, WCIt is the weight matrix of cell state, bCIt is the bias vector of cell state, tanh () is tanh letter
Number;
B1.3, the output h that each concealed nodes is obtained according to following formulat, by htIt is sequentially connected, constitutes the characteristic vector of m dimensions;
ht=ot*tanh(Ct)
ot=σ (Wo·[ht-1,xt]+bo)
Wherein, WoTo export the weight matrix of gate cell, boTo export the bias vector of gate cell, otTo export the defeated of gate cell
Go out.
3. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 2
Method, it is characterised in that the convolutional neural networks module in the step B, specific construction step is as follows:
B2.1, the characteristic vector of the m dimensions that will be extracted in step B1.3 are converted to the eigenmatrix of n × n as convolutional neural networks
Input;
B2.2, the ground floor of convolutional neural networks are convolutional layer, from m1Individual k1×k1The convolution kernel of dimension is rolled up to input data
Product computing, convolution step-length is s1, the result after convolution carries out Nonlinear Mapping, obtains the output of convolutional layer by excitation function again
That is m1Individual l1×l1The characteristic pattern of dimension;
The second layer of B2.3 convolutional neural networks is pond layer, from m2Individual k2×k2The convolution kernel of dimension, it is defeated to ground floor convolutional layer
The characteristic pattern for going out carries out step-length for s2Pond, obtain the output i.e. m of pond layer2Individual l2×l2The characteristic pattern of dimension;
B2.4, the third layer of convolutional neural networks are convolutional layer, from m3Individual k3×k3The convolution kernel of dimension, to second layer pond layer
The characteristic pattern of output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, obtains convolutional layer
Output be m3Individual l3×l3The characteristic pattern of dimension;
B2.5, the 4th layer of convolutional neural networks be convolutional layer, from m4Individual k4×k4The convolution kernel of dimension, to third layer convolutional layer
The characteristic pattern of output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, obtains convolutional layer
Output be m4Individual l4×l4The characteristic pattern of dimension;
B2.6, the layer 5 of convolutional neural networks are convolutional layer, from m5Individual k5×k5The convolution kernel of dimension, to the 4th layer of convolutional layer
The characteristic pattern of output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, obtains convolutional layer
Output be m5Individual l5×l5The characteristic pattern of dimension;
B2.7, the layer 6 of convolutional neural networks are pond layer, from m6Individual k6×k6The convolution kernel of dimension, to layer 5 convolutional layer
The characteristic pattern of output carries out step-length for s6Pond, obtain the output i.e. m of pond layer6Individual l6×l6The characteristic pattern of dimension;
B2.8, the seven, the eight of convolutional neural networks, nine layer are full articulamentum;Wherein, layer 7 is by the output of layer 6 pond layer
Characteristic pattern be connected to the c node of this layer entirely;The 8th layer of c node to layer 7 carries out ReLU function nonlinear transformations
Afterwards, the connection weight of node layer is then hidden using the control of dropout methods, full linking number is c;9th layer of full articulamentum
Output node is p, is output as having merged the softmax losses of feature tag.
4. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 3
Method, it is characterised in that the function J (θ) of the softmax losses of the convolutional neural networks in the step B is defined as follows:
Wherein, x(i)It is input vector, y(i)It is the corresponding emotional category of input vector, i=1,2 ... q, q corresponds to speech samples
Number;θjIt is parameter to be trained, j=1,2 ... p, p corresponds to emotional category number, and T is transposition, and e is the nature truth of a matter;1 { } be
Indicator function, when braces intermediate value is true, function value is 1, otherwise is 0.
5. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 3
Method, it is characterised in that tanh function representations areSigmoid function representations areWherein, x is variable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611093447.3A CN106782602B (en) | 2016-12-01 | 2016-12-01 | Speech emotion recognition method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611093447.3A CN106782602B (en) | 2016-12-01 | 2016-12-01 | Speech emotion recognition method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106782602A true CN106782602A (en) | 2017-05-31 |
CN106782602B CN106782602B (en) | 2020-03-17 |
Family
ID=58913860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611093447.3A Active CN106782602B (en) | 2016-12-01 | 2016-12-01 | Speech emotion recognition method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106782602B (en) |
Cited By (91)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107274378A (en) * | 2017-07-25 | 2017-10-20 | 江西理工大学 | A kind of image blurring type identification and parameter tuning method for merging memory CNN |
CN107293288A (en) * | 2017-06-09 | 2017-10-24 | 清华大学 | A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network |
CN107293290A (en) * | 2017-07-31 | 2017-10-24 | 郑州云海信息技术有限公司 | The method and apparatus for setting up Speech acoustics model |
CN107392109A (en) * | 2017-06-27 | 2017-11-24 | 南京邮电大学 | A kind of neonatal pain expression recognition method based on deep neural network |
CN107464568A (en) * | 2017-09-25 | 2017-12-12 | 四川长虹电器股份有限公司 | Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system |
CN107506414A (en) * | 2017-08-11 | 2017-12-22 | 武汉大学 | A kind of code based on shot and long term memory network recommends method |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN107562784A (en) * | 2017-07-25 | 2018-01-09 | 同济大学 | Short text classification method based on ResLCNN models |
CN107633842A (en) * | 2017-06-12 | 2018-01-26 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN107679557A (en) * | 2017-09-19 | 2018-02-09 | 平安科技(深圳)有限公司 | Driving model training method, driver's recognition methods, device, equipment and medium |
CN107679199A (en) * | 2017-10-11 | 2018-02-09 | 北京邮电大学 | A kind of external the Chinese text readability analysis method based on depth local feature |
CN107703564A (en) * | 2017-10-13 | 2018-02-16 | 中国科学院深圳先进技术研究院 | A kind of precipitation predicting method, system and electronic equipment |
CN107785011A (en) * | 2017-09-15 | 2018-03-09 | 北京理工大学 | Word speed estimates training, word speed method of estimation, device, equipment and the medium of model |
CN107818307A (en) * | 2017-10-31 | 2018-03-20 | 天津大学 | A kind of multi-tag Video Events detection method based on LSTM networks |
CN107862331A (en) * | 2017-10-31 | 2018-03-30 | 华中科技大学 | It is a kind of based on time series and CNN unsafe acts recognition methods and system |
CN107885853A (en) * | 2017-11-14 | 2018-04-06 | 同济大学 | A kind of combined type file classification method based on deep learning |
CN107992938A (en) * | 2017-11-24 | 2018-05-04 | 清华大学 | Space-time big data Forecasting Methodology and system based on positive and negative convolutional neural networks |
CN108039181A (en) * | 2017-11-02 | 2018-05-15 | 北京捷通华声科技股份有限公司 | The emotion information analysis method and device of a kind of voice signal |
CN108280406A (en) * | 2017-12-30 | 2018-07-13 | 广州海昇计算机科技有限公司 | A kind of Activity recognition method, system and device based on segmentation double-stream digestion |
CN108304823A (en) * | 2018-02-24 | 2018-07-20 | 重庆邮电大学 | A kind of expression recognition method based on two-fold product CNN and long memory network in short-term |
CN108346436A (en) * | 2017-08-22 | 2018-07-31 | 腾讯科技(深圳)有限公司 | Speech emotional detection method, device, computer equipment and storage medium |
CN108520753A (en) * | 2018-02-26 | 2018-09-11 | 南京工程学院 | Voice lie detection method based on the two-way length of convolution memory network in short-term |
CN108550375A (en) * | 2018-03-14 | 2018-09-18 | 鲁东大学 | A kind of emotion identification method, device and computer equipment based on voice signal |
CN108564942A (en) * | 2018-04-04 | 2018-09-21 | 南京师范大学 | One kind being based on the adjustable speech-emotion recognition method of susceptibility and system |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108630199A (en) * | 2018-06-30 | 2018-10-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of data processing method of acoustic model |
CN108717856A (en) * | 2018-06-16 | 2018-10-30 | 台州学院 | A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network |
CN108766419A (en) * | 2018-05-04 | 2018-11-06 | 华南理工大学 | A kind of abnormal speech detection method based on deep learning |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108806667A (en) * | 2018-05-29 | 2018-11-13 | 重庆大学 | The method for synchronously recognizing of voice and mood based on neural network |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
CN108831450A (en) * | 2018-03-30 | 2018-11-16 | 杭州鸟瞰智能科技股份有限公司 | A kind of virtual robot man-machine interaction method based on user emotion identification |
CN108922617A (en) * | 2018-06-26 | 2018-11-30 | 电子科技大学 | A kind of self-closing disease aided diagnosis method neural network based |
WO2018227169A1 (en) * | 2017-06-08 | 2018-12-13 | Newvoicemedia Us Inc. | Optimal human-machine conversations using emotion-enhanced natural speech |
CN109003625A (en) * | 2018-07-27 | 2018-12-14 | 中国科学院自动化研究所 | Speech-emotion recognition method and system based on ternary loss |
CN109034034A (en) * | 2018-07-12 | 2018-12-18 | 广州麦仑信息科技有限公司 | A kind of vein identification method based on nitrification enhancement optimization convolutional neural networks |
CN109036467A (en) * | 2018-10-26 | 2018-12-18 | 南京邮电大学 | CFFD extracting method, speech-emotion recognition method and system based on TF-LSTM |
CN109036459A (en) * | 2018-08-22 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | Sound end detecting method, device, computer equipment, computer storage medium |
CN109087635A (en) * | 2018-08-30 | 2018-12-25 | 湖北工业大学 | A kind of speech-sound intelligent classification method and system |
CN109147826A (en) * | 2018-08-22 | 2019-01-04 | 平安科技(深圳)有限公司 | Music emotion recognition method, device, computer equipment and computer storage medium |
CN109146066A (en) * | 2018-11-01 | 2019-01-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition |
CN109190514A (en) * | 2018-08-14 | 2019-01-11 | 电子科技大学 | Face character recognition methods and system based on two-way shot and long term memory network |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109243494A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism |
CN109243490A (en) * | 2018-10-11 | 2019-01-18 | 平安科技(深圳)有限公司 | Driver's Emotion identification method and terminal device |
CN109285562A (en) * | 2018-09-28 | 2019-01-29 | 东南大学 | Speech-emotion recognition method based on attention mechanism |
CN109282837A (en) * | 2018-10-24 | 2019-01-29 | 福州大学 | Bragg grating based on LSTM network interlocks the demodulation method of spectrum |
CN109346107A (en) * | 2018-10-10 | 2019-02-15 | 中山大学 | A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved |
CN109389992A (en) * | 2018-10-18 | 2019-02-26 | 天津大学 | A kind of speech-emotion recognition method based on amplitude and phase information |
WO2019037382A1 (en) * | 2017-08-24 | 2019-02-28 | 平安科技(深圳)有限公司 | Emotion recognition-based voice quality inspection method and device, equipment and storage medium |
CN109426858A (en) * | 2017-08-29 | 2019-03-05 | 京东方科技集团股份有限公司 | Neural network, training method, image processing method and image processing apparatus |
CN109567793A (en) * | 2018-11-16 | 2019-04-05 | 西北工业大学 | A kind of ECG signal processing method towards cardiac arrhythmia classification |
CN109637545A (en) * | 2019-01-17 | 2019-04-16 | 哈尔滨工程大学 | Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term |
CN109754790A (en) * | 2017-11-01 | 2019-05-14 | 中国科学院声学研究所 | A kind of speech recognition system and method based on mixing acoustic model |
CN110096587A (en) * | 2019-01-11 | 2019-08-06 | 杭州电子科技大学 | The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism |
CN110179453A (en) * | 2018-06-01 | 2019-08-30 | 山东省计算中心(国家超级计算济南中心) | Electrocardiogram classification method based on convolutional neural networks and shot and long term memory network |
WO2019179036A1 (en) * | 2018-03-19 | 2019-09-26 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, identity authentication method, and storage medium |
CN110322900A (en) * | 2019-06-25 | 2019-10-11 | 深圳市壹鸽科技有限公司 | A kind of method of phonic signal character fusion |
CN110363751A (en) * | 2019-07-01 | 2019-10-22 | 浙江大学 | A kind of big enteroscope polyp detection method based on generation collaborative network |
WO2019232883A1 (en) * | 2018-06-07 | 2019-12-12 | 平安科技(深圳)有限公司 | Insurance product pushing method and device, computer device and storage medium |
CN110600018A (en) * | 2019-09-05 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Voice recognition method and device and neural network training method and device |
CN110738852A (en) * | 2019-10-23 | 2020-01-31 | 浙江大学 | intersection steering overflow detection method based on vehicle track and long and short memory neural network |
CN110929762A (en) * | 2019-10-30 | 2020-03-27 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Method and system for detecting body language and analyzing behavior based on deep learning |
CN110956953A (en) * | 2019-11-29 | 2020-04-03 | 中山大学 | Quarrel identification method based on audio analysis and deep learning |
CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | Hybrid neural network vehicle type identification method based on audio feature fusion |
CN111179910A (en) * | 2019-12-17 | 2020-05-19 | 深圳追一科技有限公司 | Speed of speech recognition method and apparatus, server, computer readable storage medium |
CN111210844A (en) * | 2020-02-03 | 2020-05-29 | 北京达佳互联信息技术有限公司 | Method, device and equipment for determining speech emotion recognition model and storage medium |
CN111222624A (en) * | 2018-11-26 | 2020-06-02 | 深圳云天励飞技术有限公司 | Parallel computing method and device |
CN111310672A (en) * | 2020-02-19 | 2020-06-19 | 广州数锐智能科技有限公司 | Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling |
CN111326178A (en) * | 2020-02-27 | 2020-06-23 | 长沙理工大学 | Multi-mode speech emotion recognition system and method based on convolutional neural network |
CN111524535A (en) * | 2020-04-30 | 2020-08-11 | 杭州电子科技大学 | Feature fusion method for speech emotion recognition based on attention mechanism |
JP2020134719A (en) * | 2019-02-20 | 2020-08-31 | ソフトバンク株式会社 | Translation device, translation method, and translation program |
CN111709284A (en) * | 2020-05-07 | 2020-09-25 | 西安理工大学 | Dance emotion recognition method based on CNN-LSTM |
CN111883252A (en) * | 2020-07-29 | 2020-11-03 | 济南浪潮高新科技投资发展有限公司 | Auxiliary diagnosis method, device, equipment and storage medium for infantile autism |
CN112001482A (en) * | 2020-08-14 | 2020-11-27 | 佳都新太科技股份有限公司 | Vibration prediction and model training method and device, computer equipment and storage medium |
CN112037822A (en) * | 2020-07-30 | 2020-12-04 | 华南师范大学 | Voice emotion recognition method based on ICNN and Bi-LSTM |
CN112101095A (en) * | 2020-08-02 | 2020-12-18 | 华南理工大学 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
CN112187413A (en) * | 2020-08-28 | 2021-01-05 | 中国人民解放军海军航空大学航空作战勤务学院 | SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology) |
CN112259126A (en) * | 2020-09-24 | 2021-01-22 | 广州大学 | Robot and method for assisting in recognizing autism voice features |
CN112383369A (en) * | 2020-07-23 | 2021-02-19 | 哈尔滨工业大学 | Cognitive radio multi-channel spectrum sensing method based on CNN-LSTM network model |
CN112446266A (en) * | 2019-09-04 | 2021-03-05 | 北京君正集成电路股份有限公司 | Face recognition network structure suitable for front end |
CN112735434A (en) * | 2020-12-09 | 2021-04-30 | 中国人民解放军陆军工程大学 | Voice communication method and system with voiceprint cloning function |
CN112735479A (en) * | 2021-03-31 | 2021-04-30 | 南方电网数字电网研究院有限公司 | Speech emotion recognition method and device, computer equipment and storage medium |
CN112766292A (en) * | 2019-11-04 | 2021-05-07 | 中移(上海)信息通信科技有限公司 | Identity authentication method, device, equipment and storage medium |
CN112819133A (en) * | 2019-11-15 | 2021-05-18 | 北方工业大学 | Construction method of deep hybrid neural network emotion recognition model |
WO2021127982A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, smart device, and computer-readable storage medium |
WO2021147363A1 (en) * | 2020-01-20 | 2021-07-29 | 中国电子科技集团公司电子科学研究院 | Text-based major depressive disorder recognition method |
CN113221758A (en) * | 2021-05-16 | 2021-08-06 | 西北工业大学 | Underwater acoustic target identification method based on GRU-NIN model |
CN113255800A (en) * | 2021-06-02 | 2021-08-13 | 中国科学院自动化研究所 | Robust emotion modeling system based on audio and video |
WO2021164346A1 (en) * | 2020-02-21 | 2021-08-26 | 乐普(北京)医疗器械股份有限公司 | Method and device for predicting blood pressure |
CN114305418A (en) * | 2021-12-16 | 2022-04-12 | 广东工业大学 | Data acquisition system and method for depression state intelligent evaluation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279495A (en) * | 2015-10-23 | 2016-01-27 | 天津大学 | Video description method based on deep learning and text summarization |
CN105469065A (en) * | 2015-12-07 | 2016-04-06 | 中国科学院自动化研究所 | Recurrent neural network-based discrete emotion recognition method |
US20160099010A1 (en) * | 2014-10-03 | 2016-04-07 | Google Inc. | Convolutional, long short-term memory, fully connected deep neural networks |
CN105844239A (en) * | 2016-03-23 | 2016-08-10 | 北京邮电大学 | Method for detecting riot and terror videos based on CNN and LSTM |
CN106096568A (en) * | 2016-06-21 | 2016-11-09 | 同济大学 | A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network |
-
2016
- 2016-12-01 CN CN201611093447.3A patent/CN106782602B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160099010A1 (en) * | 2014-10-03 | 2016-04-07 | Google Inc. | Convolutional, long short-term memory, fully connected deep neural networks |
CN105279495A (en) * | 2015-10-23 | 2016-01-27 | 天津大学 | Video description method based on deep learning and text summarization |
CN105469065A (en) * | 2015-12-07 | 2016-04-06 | 中国科学院自动化研究所 | Recurrent neural network-based discrete emotion recognition method |
CN105844239A (en) * | 2016-03-23 | 2016-08-10 | 北京邮电大学 | Method for detecting riot and terror videos based on CNN and LSTM |
CN106096568A (en) * | 2016-06-21 | 2016-11-09 | 同济大学 | A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network |
Non-Patent Citations (1)
Title |
---|
PENG CHEN ET AL: "Clause Sentiment Identification Basedon Convolutional Neural Network With Context Embedding", 《2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY》 * |
Cited By (137)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018227169A1 (en) * | 2017-06-08 | 2018-12-13 | Newvoicemedia Us Inc. | Optimal human-machine conversations using emotion-enhanced natural speech |
CN107293288A (en) * | 2017-06-09 | 2017-10-24 | 清华大学 | A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network |
CN107293288B (en) * | 2017-06-09 | 2020-04-21 | 清华大学 | Acoustic model modeling method of residual long-short term memory recurrent neural network |
CN107633842B (en) * | 2017-06-12 | 2018-08-31 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN107633842A (en) * | 2017-06-12 | 2018-01-26 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN107392109A (en) * | 2017-06-27 | 2017-11-24 | 南京邮电大学 | A kind of neonatal pain expression recognition method based on deep neural network |
CN107274378A (en) * | 2017-07-25 | 2017-10-20 | 江西理工大学 | A kind of image blurring type identification and parameter tuning method for merging memory CNN |
CN107274378B (en) * | 2017-07-25 | 2020-04-03 | 江西理工大学 | Image fuzzy type identification and parameter setting method based on fusion memory CNN |
CN107562784A (en) * | 2017-07-25 | 2018-01-09 | 同济大学 | Short text classification method based on ResLCNN models |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN107293290A (en) * | 2017-07-31 | 2017-10-24 | 郑州云海信息技术有限公司 | The method and apparatus for setting up Speech acoustics model |
CN107562792B (en) * | 2017-07-31 | 2020-01-31 | 同济大学 | question-answer matching method based on deep learning |
CN107506414A (en) * | 2017-08-11 | 2017-12-22 | 武汉大学 | A kind of code based on shot and long term memory network recommends method |
CN107506414B (en) * | 2017-08-11 | 2020-01-07 | 武汉大学 | Code recommendation method based on long-term and short-term memory network |
US11922969B2 (en) | 2017-08-22 | 2024-03-05 | Tencent Technology (Shenzhen) Company Limited | Speech emotion detection method and apparatus, computer device, and storage medium |
WO2019037700A1 (en) * | 2017-08-22 | 2019-02-28 | 腾讯科技(深圳)有限公司 | Speech emotion detection method and apparatus, computer device, and storage medium |
CN108346436A (en) * | 2017-08-22 | 2018-07-31 | 腾讯科技(深圳)有限公司 | Speech emotional detection method, device, computer equipment and storage medium |
US11189302B2 (en) | 2017-08-22 | 2021-11-30 | Tencent Technology (Shenzhen) Company Limited | Speech emotion detection method and apparatus, computer device, and storage medium |
WO2019037382A1 (en) * | 2017-08-24 | 2019-02-28 | 平安科技(深圳)有限公司 | Emotion recognition-based voice quality inspection method and device, equipment and storage medium |
CN109426858A (en) * | 2017-08-29 | 2019-03-05 | 京东方科技集团股份有限公司 | Neural network, training method, image processing method and image processing apparatus |
CN109426858B (en) * | 2017-08-29 | 2021-04-06 | 京东方科技集团股份有限公司 | Neural network, training method, image processing method, and image processing apparatus |
CN107785011A (en) * | 2017-09-15 | 2018-03-09 | 北京理工大学 | Word speed estimates training, word speed method of estimation, device, equipment and the medium of model |
CN107785011B (en) * | 2017-09-15 | 2020-07-03 | 北京理工大学 | Training method, device, equipment and medium of speech rate estimation model and speech rate estimation method, device and equipment |
CN107679557A (en) * | 2017-09-19 | 2018-02-09 | 平安科技(深圳)有限公司 | Driving model training method, driver's recognition methods, device, equipment and medium |
CN107464568A (en) * | 2017-09-25 | 2017-12-12 | 四川长虹电器股份有限公司 | Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system |
CN107464568B (en) * | 2017-09-25 | 2020-06-30 | 四川长虹电器股份有限公司 | Speaker identification method and system based on three-dimensional convolution neural network text independence |
CN107679199A (en) * | 2017-10-11 | 2018-02-09 | 北京邮电大学 | A kind of external the Chinese text readability analysis method based on depth local feature |
CN107703564A (en) * | 2017-10-13 | 2018-02-16 | 中国科学院深圳先进技术研究院 | A kind of precipitation predicting method, system and electronic equipment |
CN107703564B (en) * | 2017-10-13 | 2020-04-14 | 中国科学院深圳先进技术研究院 | Rainfall prediction method and system and electronic equipment |
CN107818307A (en) * | 2017-10-31 | 2018-03-20 | 天津大学 | A kind of multi-tag Video Events detection method based on LSTM networks |
CN107862331A (en) * | 2017-10-31 | 2018-03-30 | 华中科技大学 | It is a kind of based on time series and CNN unsafe acts recognition methods and system |
CN107818307B (en) * | 2017-10-31 | 2021-05-18 | 天津大学 | Multi-label video event detection method based on LSTM network |
CN109754790B (en) * | 2017-11-01 | 2020-11-06 | 中国科学院声学研究所 | Speech recognition system and method based on hybrid acoustic model |
CN109754790A (en) * | 2017-11-01 | 2019-05-14 | 中国科学院声学研究所 | A kind of speech recognition system and method based on mixing acoustic model |
CN108039181A (en) * | 2017-11-02 | 2018-05-15 | 北京捷通华声科技股份有限公司 | The emotion information analysis method and device of a kind of voice signal |
CN107885853A (en) * | 2017-11-14 | 2018-04-06 | 同济大学 | A kind of combined type file classification method based on deep learning |
CN107992938A (en) * | 2017-11-24 | 2018-05-04 | 清华大学 | Space-time big data Forecasting Methodology and system based on positive and negative convolutional neural networks |
CN107992938B (en) * | 2017-11-24 | 2019-05-14 | 清华大学 | Space-time big data prediction technique and system based on positive and negative convolutional neural networks |
CN108280406A (en) * | 2017-12-30 | 2018-07-13 | 广州海昇计算机科技有限公司 | A kind of Activity recognition method, system and device based on segmentation double-stream digestion |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108597539B (en) * | 2018-02-09 | 2021-09-03 | 桂林电子科技大学 | Speech emotion recognition method based on parameter migration and spectrogram |
CN108304823A (en) * | 2018-02-24 | 2018-07-20 | 重庆邮电大学 | A kind of expression recognition method based on two-fold product CNN and long memory network in short-term |
CN108304823B (en) * | 2018-02-24 | 2022-03-22 | 重庆邮电大学 | Expression recognition method based on double-convolution CNN and long-and-short-term memory network |
CN108520753A (en) * | 2018-02-26 | 2018-09-11 | 南京工程学院 | Voice lie detection method based on the two-way length of convolution memory network in short-term |
CN108520753B (en) * | 2018-02-26 | 2020-07-24 | 南京工程学院 | Voice lie detection method based on convolution bidirectional long-time and short-time memory network |
CN108550375A (en) * | 2018-03-14 | 2018-09-18 | 鲁东大学 | A kind of emotion identification method, device and computer equipment based on voice signal |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
WO2019179036A1 (en) * | 2018-03-19 | 2019-09-26 | 平安科技(深圳)有限公司 | Deep neural network model, electronic device, identity authentication method, and storage medium |
CN108831450A (en) * | 2018-03-30 | 2018-11-16 | 杭州鸟瞰智能科技股份有限公司 | A kind of virtual robot man-machine interaction method based on user emotion identification |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108564942A (en) * | 2018-04-04 | 2018-09-21 | 南京师范大学 | One kind being based on the adjustable speech-emotion recognition method of susceptibility and system |
CN108564942B (en) * | 2018-04-04 | 2021-01-26 | 南京师范大学 | Voice emotion recognition method and system based on adjustable sensitivity |
CN108766419A (en) * | 2018-05-04 | 2018-11-06 | 华南理工大学 | A kind of abnormal speech detection method based on deep learning |
CN108766419B (en) * | 2018-05-04 | 2020-10-27 | 华南理工大学 | Abnormal voice distinguishing method based on deep learning |
CN108806667A (en) * | 2018-05-29 | 2018-11-13 | 重庆大学 | The method for synchronously recognizing of voice and mood based on neural network |
CN110179453A (en) * | 2018-06-01 | 2019-08-30 | 山东省计算中心(国家超级计算济南中心) | Electrocardiogram classification method based on convolutional neural networks and shot and long term memory network |
CN110179453B (en) * | 2018-06-01 | 2020-01-03 | 山东省计算中心(国家超级计算济南中心) | Electrocardiogram classification method based on convolutional neural network and long-short term memory network |
WO2019232883A1 (en) * | 2018-06-07 | 2019-12-12 | 平安科技(深圳)有限公司 | Insurance product pushing method and device, computer device and storage medium |
CN108717856B (en) * | 2018-06-16 | 2022-03-08 | 台州学院 | Speech emotion recognition method based on multi-scale deep convolution cyclic neural network |
CN108717856A (en) * | 2018-06-16 | 2018-10-30 | 台州学院 | A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network |
CN108922617A (en) * | 2018-06-26 | 2018-11-30 | 电子科技大学 | A kind of self-closing disease aided diagnosis method neural network based |
CN108630199A (en) * | 2018-06-30 | 2018-10-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of data processing method of acoustic model |
CN109034034A (en) * | 2018-07-12 | 2018-12-18 | 广州麦仑信息科技有限公司 | A kind of vein identification method based on nitrification enhancement optimization convolutional neural networks |
CN109003625A (en) * | 2018-07-27 | 2018-12-14 | 中国科学院自动化研究所 | Speech-emotion recognition method and system based on ternary loss |
CN109190514A (en) * | 2018-08-14 | 2019-01-11 | 电子科技大学 | Face character recognition methods and system based on two-way shot and long term memory network |
CN109190514B (en) * | 2018-08-14 | 2021-10-01 | 电子科技大学 | Face attribute recognition method and system based on bidirectional long-short term memory network |
CN109147826A (en) * | 2018-08-22 | 2019-01-04 | 平安科技(深圳)有限公司 | Music emotion recognition method, device, computer equipment and computer storage medium |
CN109036459A (en) * | 2018-08-22 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | Sound end detecting method, device, computer equipment, computer storage medium |
CN109147826B (en) * | 2018-08-22 | 2022-12-27 | 平安科技(深圳)有限公司 | Music emotion recognition method and device, computer equipment and computer storage medium |
CN109087635A (en) * | 2018-08-30 | 2018-12-25 | 湖北工业大学 | A kind of speech-sound intelligent classification method and system |
CN109285562B (en) * | 2018-09-28 | 2022-09-23 | 东南大学 | Voice emotion recognition method based on attention mechanism |
CN109285562A (en) * | 2018-09-28 | 2019-01-29 | 东南大学 | Speech-emotion recognition method based on attention mechanism |
CN109346107A (en) * | 2018-10-10 | 2019-02-15 | 中山大学 | A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved |
CN109346107B (en) * | 2018-10-10 | 2022-09-30 | 中山大学 | LSTM-based method for inversely solving pronunciation of independent speaker |
CN109243490A (en) * | 2018-10-11 | 2019-01-18 | 平安科技(深圳)有限公司 | Driver's Emotion identification method and terminal device |
CN109389992A (en) * | 2018-10-18 | 2019-02-26 | 天津大学 | A kind of speech-emotion recognition method based on amplitude and phase information |
CN109282837A (en) * | 2018-10-24 | 2019-01-29 | 福州大学 | Bragg grating based on LSTM network interlocks the demodulation method of spectrum |
CN109036467A (en) * | 2018-10-26 | 2018-12-18 | 南京邮电大学 | CFFD extracting method, speech-emotion recognition method and system based on TF-LSTM |
CN109243494A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism |
CN109243494B (en) * | 2018-10-30 | 2022-10-11 | 南京工程学院 | Children emotion recognition method based on multi-attention mechanism long-time memory network |
CN109243493B (en) * | 2018-10-30 | 2022-09-16 | 南京工程学院 | Infant crying emotion recognition method based on improved long-time and short-time memory network |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109146066A (en) * | 2018-11-01 | 2019-01-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition |
CN109567793A (en) * | 2018-11-16 | 2019-04-05 | 西北工业大学 | A kind of ECG signal processing method towards cardiac arrhythmia classification |
CN109567793B (en) * | 2018-11-16 | 2021-11-23 | 西北工业大学 | Arrhythmia classification-oriented ECG signal processing method |
CN111222624A (en) * | 2018-11-26 | 2020-06-02 | 深圳云天励飞技术有限公司 | Parallel computing method and device |
CN111222624B (en) * | 2018-11-26 | 2022-04-29 | 深圳云天励飞技术股份有限公司 | Parallel computing method and device |
CN110096587B (en) * | 2019-01-11 | 2020-07-07 | 杭州电子科技大学 | Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model |
CN110096587A (en) * | 2019-01-11 | 2019-08-06 | 杭州电子科技大学 | The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism |
CN109637545B (en) * | 2019-01-17 | 2023-05-30 | 哈尔滨工程大学 | Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network |
CN109637545A (en) * | 2019-01-17 | 2019-04-16 | 哈尔滨工程大学 | Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term |
JP2020134719A (en) * | 2019-02-20 | 2020-08-31 | ソフトバンク株式会社 | Translation device, translation method, and translation program |
CN110322900A (en) * | 2019-06-25 | 2019-10-11 | 深圳市壹鸽科技有限公司 | A kind of method of phonic signal character fusion |
CN110363751A (en) * | 2019-07-01 | 2019-10-22 | 浙江大学 | A kind of big enteroscope polyp detection method based on generation collaborative network |
CN112446266B (en) * | 2019-09-04 | 2024-03-29 | 北京君正集成电路股份有限公司 | Face recognition network structure suitable for front end |
CN112446266A (en) * | 2019-09-04 | 2021-03-05 | 北京君正集成电路股份有限公司 | Face recognition network structure suitable for front end |
CN110600018A (en) * | 2019-09-05 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Voice recognition method and device and neural network training method and device |
CN110738852B (en) * | 2019-10-23 | 2020-12-18 | 浙江大学 | Intersection steering overflow detection method based on vehicle track and long and short memory neural network |
CN110738852A (en) * | 2019-10-23 | 2020-01-31 | 浙江大学 | intersection steering overflow detection method based on vehicle track and long and short memory neural network |
CN110929762A (en) * | 2019-10-30 | 2020-03-27 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Method and system for detecting body language and analyzing behavior based on deep learning |
CN110929762B (en) * | 2019-10-30 | 2023-05-12 | 中科南京人工智能创新研究院 | Limb language detection and behavior analysis method and system based on deep learning |
CN112766292A (en) * | 2019-11-04 | 2021-05-07 | 中移(上海)信息通信科技有限公司 | Identity authentication method, device, equipment and storage medium |
CN112766292B (en) * | 2019-11-04 | 2024-07-26 | 中移(上海)信息通信科技有限公司 | Identity authentication method, device, equipment and storage medium |
CN112819133A (en) * | 2019-11-15 | 2021-05-18 | 北方工业大学 | Construction method of deep hybrid neural network emotion recognition model |
CN110956953A (en) * | 2019-11-29 | 2020-04-03 | 中山大学 | Quarrel identification method based on audio analysis and deep learning |
CN110956953B (en) * | 2019-11-29 | 2023-03-10 | 中山大学 | Quarrel recognition method based on audio analysis and deep learning |
CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | Hybrid neural network vehicle type identification method based on audio feature fusion |
CN111179910A (en) * | 2019-12-17 | 2020-05-19 | 深圳追一科技有限公司 | Speed of speech recognition method and apparatus, server, computer readable storage medium |
WO2021127982A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, smart device, and computer-readable storage medium |
WO2021147363A1 (en) * | 2020-01-20 | 2021-07-29 | 中国电子科技集团公司电子科学研究院 | Text-based major depressive disorder recognition method |
CN111210844A (en) * | 2020-02-03 | 2020-05-29 | 北京达佳互联信息技术有限公司 | Method, device and equipment for determining speech emotion recognition model and storage medium |
CN111310672A (en) * | 2020-02-19 | 2020-06-19 | 广州数锐智能科技有限公司 | Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling |
WO2021164346A1 (en) * | 2020-02-21 | 2021-08-26 | 乐普(北京)医疗器械股份有限公司 | Method and device for predicting blood pressure |
CN111326178A (en) * | 2020-02-27 | 2020-06-23 | 长沙理工大学 | Multi-mode speech emotion recognition system and method based on convolutional neural network |
CN111524535B (en) * | 2020-04-30 | 2022-06-21 | 杭州电子科技大学 | Feature fusion method for speech emotion recognition based on attention mechanism |
CN111524535A (en) * | 2020-04-30 | 2020-08-11 | 杭州电子科技大学 | Feature fusion method for speech emotion recognition based on attention mechanism |
CN111709284A (en) * | 2020-05-07 | 2020-09-25 | 西安理工大学 | Dance emotion recognition method based on CNN-LSTM |
CN112383369A (en) * | 2020-07-23 | 2021-02-19 | 哈尔滨工业大学 | Cognitive radio multi-channel spectrum sensing method based on CNN-LSTM network model |
CN111883252A (en) * | 2020-07-29 | 2020-11-03 | 济南浪潮高新科技投资发展有限公司 | Auxiliary diagnosis method, device, equipment and storage medium for infantile autism |
CN112037822B (en) * | 2020-07-30 | 2022-09-27 | 华南师范大学 | Voice emotion recognition method based on ICNN and Bi-LSTM |
CN112037822A (en) * | 2020-07-30 | 2020-12-04 | 华南师范大学 | Voice emotion recognition method based on ICNN and Bi-LSTM |
CN112101095A (en) * | 2020-08-02 | 2020-12-18 | 华南理工大学 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
CN112101095B (en) * | 2020-08-02 | 2023-08-29 | 华南理工大学 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
CN112001482A (en) * | 2020-08-14 | 2020-11-27 | 佳都新太科技股份有限公司 | Vibration prediction and model training method and device, computer equipment and storage medium |
CN112001482B (en) * | 2020-08-14 | 2024-05-24 | 佳都科技集团股份有限公司 | Vibration prediction and model training method, device, computer equipment and storage medium |
CN112187413A (en) * | 2020-08-28 | 2021-01-05 | 中国人民解放军海军航空大学航空作战勤务学院 | SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology) |
CN112187413B (en) * | 2020-08-28 | 2022-05-03 | 中国人民解放军海军航空大学航空作战勤务学院 | SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology) |
CN112259126A (en) * | 2020-09-24 | 2021-01-22 | 广州大学 | Robot and method for assisting in recognizing autism voice features |
CN112735434A (en) * | 2020-12-09 | 2021-04-30 | 中国人民解放军陆军工程大学 | Voice communication method and system with voiceprint cloning function |
CN112735479A (en) * | 2021-03-31 | 2021-04-30 | 南方电网数字电网研究院有限公司 | Speech emotion recognition method and device, computer equipment and storage medium |
CN112735479B (en) * | 2021-03-31 | 2021-07-06 | 南方电网数字电网研究院有限公司 | Speech emotion recognition method and device, computer equipment and storage medium |
CN113221758A (en) * | 2021-05-16 | 2021-08-06 | 西北工业大学 | Underwater acoustic target identification method based on GRU-NIN model |
CN113221758B (en) * | 2021-05-16 | 2023-07-14 | 西北工业大学 | GRU-NIN model-based underwater sound target identification method |
CN113255800B (en) * | 2021-06-02 | 2021-10-15 | 中国科学院自动化研究所 | Robust emotion modeling system based on audio and video |
CN113255800A (en) * | 2021-06-02 | 2021-08-13 | 中国科学院自动化研究所 | Robust emotion modeling system based on audio and video |
CN114305418B (en) * | 2021-12-16 | 2023-08-04 | 广东工业大学 | Data acquisition system and method for intelligent assessment of depression state |
CN114305418A (en) * | 2021-12-16 | 2022-04-12 | 广东工业大学 | Data acquisition system and method for depression state intelligent evaluation |
Also Published As
Publication number | Publication date |
---|---|
CN106782602B (en) | 2020-03-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782602A (en) | Speech-emotion recognition method based on length time memory network and convolutional neural networks | |
CN109036465B (en) | Speech emotion recognition method | |
CN106878677B (en) | Student classroom mastery degree evaluation system and method based on multiple sensors | |
Sun et al. | Speech emotion recognition based on DNN-decision tree SVM model | |
EP4002362A1 (en) | Method and apparatus for training speech separation model, storage medium, and computer device | |
CN108664632B (en) | Text emotion classification algorithm based on convolutional neural network and attention mechanism | |
CN109637522B (en) | Speech emotion recognition method for extracting depth space attention features based on spectrogram | |
Wang et al. | Research on Web text classification algorithm based on improved CNN and SVM | |
CN109146066A (en) | A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition | |
CN107256393A (en) | The feature extraction and state recognition of one-dimensional physiological signal based on deep learning | |
CN102890930B (en) | Speech emotion recognizing method based on hidden Markov model (HMM) / self-organizing feature map neural network (SOFMNN) hybrid model | |
CN106952649A (en) | Method for distinguishing speek person based on convolutional neural networks and spectrogram | |
CN103544963A (en) | Voice emotion recognition method based on core semi-supervised discrimination and analysis | |
CN107785015A (en) | A kind of audio recognition method and device | |
CN110853656B (en) | Audio tampering identification method based on improved neural network | |
CN107705806A (en) | A kind of method for carrying out speech emotion recognition using spectrogram and deep convolutional neural networks | |
Zhou et al. | Deep learning based affective model for speech emotion recognition | |
CN106897254A (en) | A kind of network representation learning method | |
CN107039036A (en) | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network | |
CN109558935A (en) | Emotion recognition and exchange method and system based on deep learning | |
CN106339718A (en) | Classification method based on neural network and classification device thereof | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
CN113033180B (en) | Automatic generation service system for Tibetan reading problem of primary school | |
CN110046709A (en) | A kind of multi-task learning model based on two-way LSTM | |
Dong et al. | Environmental sound classification based on improved compact bilinear attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210023 Applicant after: Nanjing Post & Telecommunication Univ. Address before: 210003, No. 66, new exemplary Road, Nanjing, Jiangsu Applicant before: Nanjing Post & Telecommunication Univ. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |