CN106782602A - Speech-emotion recognition method based on length time memory network and convolutional neural networks - Google Patents

Speech-emotion recognition method based on length time memory network and convolutional neural networks Download PDF

Info

Publication number
CN106782602A
CN106782602A CN201611093447.3A CN201611093447A CN106782602A CN 106782602 A CN106782602 A CN 106782602A CN 201611093447 A CN201611093447 A CN 201611093447A CN 106782602 A CN106782602 A CN 106782602A
Authority
CN
China
Prior art keywords
layer
neural networks
convolutional neural
output
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611093447.3A
Other languages
Chinese (zh)
Other versions
CN106782602B (en
Inventor
袁亮
卢官明
闫静杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201611093447.3A priority Critical patent/CN106782602B/en
Publication of CN106782602A publication Critical patent/CN106782602A/en
Application granted granted Critical
Publication of CN106782602B publication Critical patent/CN106782602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of speech-emotion recognition method based on length time memory network and convolutional neural networks, the method builds the speech emotion recognition system based on LSTM and CNN, using voice sequence as system input, LSTM and CNN are trained using back-propagation algorithm, optimize the parameter of network, the network model after being optimized;Emotional semantic classification is carried out to the voice sequence of new input using the network model for having trained, is divided into sad, glad, detest, frightened, scaring, neutral six kinds of emotions.The method considered two kinds of network models of LSTM and CNN, it is to avoid artificial selection and extract the cumbersome of feature, improves the accuracy rate of emotion recognition.

Description

Speech-emotion recognition method based on length time memory network and convolutional neural networks
Technical field
The present invention relates to image procossing and area of pattern recognition, length time memory network and convolutional Neural are based particularly on The speech-emotion recognition method of network.
Background technology
In human communication, there is the side exchanged including the much information including voice, body language, facial expression etc. Formula.Wherein, voice signal is the most quick, exchange way of most original, and studied scholar is considered to realize man-machine interaction most One of effective method.Since nearly half a century, scholars have studied the substantial amounts of problem on speech recognition, i.e., how by language Sound sequence is converted to text.Although being made substantial progress in terms of speech recognition, because machine cannot understand speaker's Affective state, causes realizing being had got long long way to go in terms of the mankind and machine are naturally interacted.This has also driven another The research of aspect, is how to identify the affective state of speaker, i.e. speech emotion recognition from voice.
Speech emotion recognition as man-machine interaction an important branch, can be extensively using with education, medical treatment, traffic etc. Every field.In onboard system, can serve as being monitored the state of mind of driver, judge it whether in safe shape State, so as to be reminded when driver fatigue, it is to avoid the generation of traffic accident;In telephone service, can serve as The user fierce to words expression sorts, and is transferred to artificial customer service, optimizes Consumer's Experience, improves whole service water It is flat;In clinical medicine, the emotion change by speech emotion recognition to patients with depression or autism children is tracked, As medical diagnosis on disease and the instrument of auxiliary treatment;In robot research, help robot that the sense of people is understood using voice messaging Feelings, make the response of friendly and intelligence, realize interaction.
The traditional extraction feature of the method for most of speech emotion recognition use at this stage, then classified with grader Method.Conventional phonetic feature includes fundamental tone, word speed, intensity (prosodic features), linear prediction residue error, mel-frequency cepstrum Coefficient (spectrum signature) etc..Conventional sorting technique includes hidden Markov model, SVMs, gauss hybrid models.Pass The emotion identification method of system has tended to ripe, but still suffers from certain deficiency.Such as, which kind of feature pair is still not clear at present The influence of emotion recognition is maximum, only from a kind of feature as the foundation for judging in most of experiments, reduces emotion recognition Objectivity.In addition, in existing some features, such as the prosodic features such as fundamental tone, word speed is influenceed larger by the style of speaker, increase The complexity of identification is added.
With the development of nearly stage deep learning, many researcher's selections complete emotion knowledge using training network model Not.In existing speech-emotion recognition method, mainly there is the speech-emotion recognition method based on depth belief network, based on length The speech-emotion recognition method of time memory network and the speech-emotion recognition method based on convolutional neural networks.Above-mentioned three kinds of sides In method, the major defect for existing is:The advantage of each network model cannot be taken into account.Such as, depth belief network can will be one-dimensional Sequence is used as to be input into, but the correlation between sequence cannot be utilized front and rear;Although length time memory network can utilize sequence Correlation between front and rear, but the intrinsic dimensionality for extracting is higher;Convolutional neural networks cannot be processed directly voice sequence, Need first to carry out Fourier transformation to voice signal, be converted into conduct input after frequency spectrum.Traditional speech-emotion recognition method It is small in feature extraction and development of categories prospect, and the existing speech emotional method network based on deep learning is more single.
The content of the invention
The technical problems to be solved by the invention are to overcome the deficiencies in the prior art, and provide and be based on length time memory net The speech-emotion recognition method of network and convolutional neural networks, it is to avoid the complex process of artificial extraction and screening feature, by instruction Practice the adjusting parameter of network self-adapting, obtain optimal emotion recognition effect.
The present invention uses following technical scheme to solve above-mentioned technical problem:
According to a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention Method, comprises the following steps:
Step A, pretreatment operation is carried out to the speech samples in speech emotional database so that the equal energy of each speech samples Represented with an isometric sequence, so as to obtain pretreated voice sequence;
The speech emotion recognition system of step B, structure based on length time memory network LSTM and convolutional neural networks CNN System, it includes two basic modules:Length time memory mixed-media network modules mixed-media and convolutional neural networks module;
Step C, pretreated voice sequence is sequentially sent to speech emotion recognition system is repeatedly trained, using anti- The parameter of LSTM and CNN, the network model after being optimized are adjusted to propagation algorithm;
Step D, the network model obtained using step C training carry out emotional semantic classification to the voice sequence of new input, are divided into Sad, glad, detest, frightened, scaring, neutral six kinds of emotions.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention The further prioritization scheme of method, the length time memory mixed-media network modules mixed-media in the step B, specific construction step is as follows:
B1.1, the length for setting speech samples sequence are m, and m=n × n, n are positive integer, and setting current time forgets door list The output of unit and input gate cell is respectively ftAnd it, meet:
ft=σ (Wf·xc+bf)
it=σ (Wi·xc+bi)
Wherein, xc=[ht-1,xt], new vector xcIt is by two ht-1、xtVector joins end to end what is obtained, xtIt is current time Input, ht-1It is the state of previous moment hidden layer, xcNew vector after for connection, WfAnd WiRespectively forget gate cell and defeated Enter the weight matrix of gate cell, bfAnd biRespectively forget the bias vector of gate cell and input gate cell, σ () is sigmoid Excitation function;
B1.2, current cell state C is calculated by following formulatValue:
Wherein, Ct-1It is previous moment cell state, For current time is thin The reference value of born of the same parents' state, WCIt is the weight matrix of cell state, bCBe the bias vector of cell state, tanh () be hyperbolic just Cut function;
B1.3, the output h that each concealed nodes is obtained according to following formulat, by htIt is sequentially connected, constitutes the characteristic vector of m dimensions;
ht=ot*tanh(Ct)
ot=σ (Wo·[ht-1,xt]+bo)
Wherein, WoTo export the weight matrix of gate cell, boTo export the bias vector of gate cell, otIt is output gate cell Output.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention The further prioritization scheme of method, the convolutional neural networks module in the step B, specific construction step is as follows:
B2.1, the characteristic vector of the m dimensions that will be extracted in step B1.3 are converted to the eigenmatrix of n × n as convolutional Neural The input of network;
B2.2, the ground floor of convolutional neural networks are convolutional layer, from m1Individual k1×k1The convolution kernel of dimension enters to input data Row convolution algorithm, convolution step-length is s1, the result after convolution carries out Nonlinear Mapping, obtains convolutional layer by excitation function again Output is m1Individual l1×l1The characteristic pattern of dimension;
The second layer of B2.3 convolutional neural networks is pond layer, from m2Individual k2×k2The convolution kernel of dimension, to ground floor convolution The characteristic pattern of layer output carries out step-length for s2Pond, obtain the output i.e. m of pond layer2Individual l2×l2The characteristic pattern of dimension;
B2.4, the third layer of convolutional neural networks are convolutional layer, from m3Individual k3×k3The convolution kernel of dimension, to second layer pond The characteristic pattern for changing layer output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, is rolled up The output of lamination is m3Individual l3×l3The characteristic pattern of dimension;
B2.5, the 4th layer of convolutional neural networks be convolutional layer, from m4Individual k4×k4The convolution kernel of dimension, rolls up to third layer The characteristic pattern of lamination output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, is rolled up The output of lamination is m4Individual l4×l4The characteristic pattern of dimension;
B2.6, the layer 5 of convolutional neural networks are convolutional layer, from m5Individual k5×k5The convolution kernel of dimension, rolls up to the 4th layer The characteristic pattern of lamination output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, is rolled up The output of lamination is m5Individual l5×l5The characteristic pattern of dimension;
B2.7, the layer 6 of convolutional neural networks are pond layer, from m6Individual k6×k6The convolution kernel of dimension, rolls up to layer 5 The characteristic pattern of lamination output carries out step-length for s6Pond, obtain the output i.e. m of pond layer6Individual l6×l6The characteristic pattern of dimension;
B2.8, the seven, the eight of convolutional neural networks, nine layer are full articulamentum;Wherein, layer 7 is by layer 6 pond layer The characteristic pattern of output is connected to the c node of this layer entirely;The 8th layer of c node to layer 7 carries out the non-linear change of ReLU functions After changing, the connection weight of node layer is then hidden using the control of dropout methods, full linking number is c;9th layer of full connection Layer output node is p, is output as having merged the softmax losses of feature tag.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention The further prioritization scheme of method, the function J (θ) of the softmax losses of the convolutional neural networks in the step B is defined as follows:
Wherein, x(i)It is input vector, y(i)It is the corresponding emotional category of input vector, i=1,2 ... q, q corresponds to voice Sample number;θjIt is parameter to be trained, j=1,2 ... p, p corresponds to emotional category number, and T is transposition, and e is the nature truth of a matter;1 { } is indicator function, and when braces intermediate value is true, function value is 1, otherwise is 0.
As a kind of speech emotion recognition based on length time memory network and convolutional neural networks of the present invention The further prioritization scheme of method, tanh function representations areSigmoid function representations areWherein, x is variable.
The present invention uses above technical scheme compared with prior art, with following technique effect:
(1) the artificial complex process extracted and screen feature is avoided, by the adjusting parameter of training network self adaptation, is obtained Obtain emotion recognition effect most preferably;
(2) speech-emotion recognition method based on LSTM and CNN, two kinds of different network models are merged, and are borrowed Helping LSTM directly can be processed voice sequence, while the correlation between before and after can utilizing sequence time;By CNN The interference of noise is reduced, while more abstract feature can be learnt, the accuracy and robustness of emotion recognition is improved.
Brief description of the drawings
Fig. 1 is the flow chart of the speech-emotion recognition method based on LSTM and CNN of the invention.
Fig. 2 is the basic frame structure figure of the speech emotion recognition system based on LSTM and CNN for building.
Fig. 3 is the basic framework figure of length time memory mixed-media network modules mixed-media in speech emotion recognition system.
Fig. 4 is the basic framework figure of convolutional neural networks module in speech emotion recognition system.
Specific embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings and the specific embodiments The present invention will be described in detail.
It is of the invention to be based on if Fig. 1 is the flow chart of the speech-emotion recognition method based on LSTM and CNN of the invention The realization of the speech-emotion recognition method of LSTM and CNN is mainly comprised the steps of:
Step 1:Suitable speech emotional database is selected, sound bite therein is gathered;
In actual mechanical process, AFEW databases, the database is selected to provide original video segment, these piece of video Duan Jun is taken from cinematographic work.Compared with conventional laboratory data base, the voice and emotional expression in AFEW databases more connect Real-life situation is bordering on, with more generality.The sample age was dispersed between 1 to 70 years old, covered all age group, wherein The sample of a large amount of Children and teenagers is contained, subsequently can be used for the emotion recognition of young main body.Sample in database Originally it is divided into six classes, respectively sadness, glad, detest, frightened, scaring, neutrality, is marked with 1~6.In selecting video Used as sample set, sample frequency is 48kHz to sound bite.
Step 2:Read voice sample data, unified samples sequence length;
Due to the duration difference of speech samples, while being concentrated mainly on voice sequence in view of useful information Zone line, chooses 16384 sampled points near each voice sequence intermediate point and represents whole voice in practice.According to 7:3 Ratio arbitrarily choose speech samples respectively as training set, checking collection.The voice sequence of each sample set and label are stored It is pkl files.
Step 3:Speech emotion recognition system is built, voice sequence as input is carried out to length time memory network Training, obtains the output of hidden layer;Fig. 2 is the basic framework knot of the speech emotion recognition system based on LSTM and CNN for building Composition, illustrates that speech samples complete the overall process of emotional semantic classification, and system mainly contains two basic modules of LSTM and CNN; Fig. 3 is the basic framework figure of length time memory mixed-media network modules mixed-media in speech emotion recognition system, illustrates LSTM NEs Internal structure, reflects contacting between hiding layer state and each gate cell;Fig. 4 is convolution god in speech emotion recognition system Through the basic framework figure of mixed-media network modules mixed-media, illustrate that eigenmatrix is generated after convolution, pond and full attended operation and believe containing label The vectorial process of breath;
Use x0,x1,x2,…,xt... represent the voice sequence of input, h0,h1,h2,…,ht... represent each concealed nodes State.xc=[ht-1,xt], the input at the state of previous moment hidden layer and current time is connected as a vector by expression xc.The output that setting t forgets gate cell and input gate cell is respectively ftAnd it, calculate ftAnd itValue it is as follows:
ft=σ (Wf·xc+bf) (1)
it=σ (Wi·xc+bi) (2)
The value of cell state is calculated by following formula:
The output of mixed-media network modules mixed-media determines by current cell state, be after filtering after cell value.First will be cellular State ensures its output area between -1 to 1, by the output of this layer by a sigmoid unit followed by tanh functions Value otProduct is done, the output h of hidden layer is determinedt
ot=σ (Wo·[ht-1,xt]+bo) (4)
ht=ot*tanh(Ct) (5)
Obtain the output h of each concealed nodestIt is sequentially connected afterwards, constitutes a length of 16384 characteristic vector, and by this feature Vector is converted to 128 × 128 eigenmatrix.
Step 4:Using eigenmatrix as input, convolutional neural networks are trained, comprised the following steps that:
Ground floor is convolution pond layer, and the convolution kernel from 96 11 × 11 dimensions carries out convolution algorithm to input data, rolls up Product step-length is 3, and signal characteristic is strengthened by convolution algorithm, reduces noise.96 40 × 40 characteristic patterns of dimension are generated after convolution.
The second layer is pond layer, and using the convolution kernel of 4 × 4 dimensions, the characteristic pattern to the generation of ground floor convolutional layer carries out step-length It is 3 pond, generates 96 13 × 13 characteristic patterns of dimension;
Third layer is convolutional layer, and the convolution kernel from 256 5 × 5 dimensions carries out convolution to the characteristic pattern that the second layer is generated, and adopts With the mode for expanding edge bonus point group, feature graph embedding in convolution process is prevented.After nonlinear transformation, 256 13 are generated The characteristic pattern of × 13 dimensions;
4th layer is convolutional layer, and the convolution kernel from 384 5 × 5 dimensions carries out convolution to the characteristic pattern that third layer is generated, together Sample after nonlinear transformation, generates 384 13 × 13 characteristic patterns of dimension by the way of edge bonus point group is expanded.
Layer 5 is convolutional layer, from 256 5 × 5 convolution kernels of dimension, while by the way of edge is expanded, will generate Characteristic pattern by after Nonlinear Mapping, generate 384 13 × 13 characteristic patterns of dimension.
Layer 6 is pond layer, and using the convolution kernel of 3 × 3-dimensional, the characteristic pattern to the generation of layer 5 convolutional layer carries out step-length It is 2 pond, generates 256 6 × 6 characteristic patterns of dimension;
Seven, the eight, nine layers is full articulamentum.Wherein layer 7 is that the complete of layer 6 generation characteristic pattern is connected into 4096 Node, the 8th layer is after ReLU function nonlinear transformations are carried out to layer 7 node, hidden layer section to be controlled using dropout methods The work weight of point, dropout methods random drop part concealed nodes in training every time, the node being dropped temporarily can be with Not think be network structure a part, but retain node weight, every time only be adjusted from partial parameters.8th layer Full linking number be 4096.9th layer of full articulamentum output node is 6, is output as having merged feature tag Softmax loses.
Step 5:Using the parameter of LSTM and CNN in back-propagation algorithm adjustment system, optimum network model is chosen, preserved Its parameter;
Step 6:Test set sample is sent into optimal network model, emotion knowledge is carried out to it using the network for training Not.
The function J (θ) of the softmax losses of convolutional neural networks is defined as follows:
Wherein, x(i)It is input vector, y(i)It is the corresponding emotional category of input vector, i=1,2 ... q, q corresponds to voice Sample number;θjIt is parameter to be trained, j=1,2 ... p, p corresponds to emotional category number, and T is transposition, and e is the nature truth of a matter;1 { } is indicator function, and when braces intermediate value is true, function value is 1, otherwise is 0;With the increase of number of training, loss The value of function can be reduced constantly, corresponding θ when loss function tends towards stabilityjIt is then the parameter of network model after optimization.
Tanh functions (hyperbolic tangent function) in the present invention, are expressed asReLU function (modified lines Property unit function), (0, x), sigmoid functions (S sigmoid growth curves) are expressed as to be expressed as f (x)=maxX is variable.

Claims (5)

1. a kind of speech-emotion recognition method based on length time memory network and convolutional neural networks, it is characterised in that bag Include following steps:
Step A, pretreatment operation is carried out to the speech samples in speech emotional database so that each speech samples can use one Individual isometric sequence is represented, so as to obtain pretreated voice sequence;
The speech emotion recognition system of step B, structure based on length time memory network LSTM and convolutional neural networks CNN, its Comprising two basic modules:Length time memory mixed-media network modules mixed-media and convolutional neural networks module;
Step C, pretreated voice sequence is sequentially sent to speech emotion recognition system is repeatedly trained, using reversely biography Broadcast the parameter that algorithm adjusts LSTM and CNN, the network model after being optimized;
Step D, emotional semantic classification is carried out to the voice sequence of new input using the step C network models that obtain of training, be divided into it is sad, Glad, detest, frightened, scaring, neutral six kinds of emotions.
2. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 1 Method, it is characterised in that the length time memory mixed-media network modules mixed-media in the step B, specific construction step is as follows:
B1.1, the length for setting speech samples sequence are m, and m=n × n, n are positive integer, setting current time forget gate cell and The output for being input into gate cell is respectively ftAnd it, meet:
ft=σ (Wf·xc+bf)
it=σ (Wi·xc+bi)
Wherein, xc=[ht-1,xt], new vector xcIt is by two ht-1、xtVector joins end to end what is obtained, xtIt is defeated for current time Enter, ht-1It is the state of previous moment hidden layer, xcNew vector after for connection, WfAnd WiRespectively forget gate cell and input gate The weight matrix of unit, bfAnd biRespectively forget the bias vector of gate cell and input gate cell, σ () is encouraged for sigmoid Function;
B1.2, current cell state C is calculated by following formulatValue:
C t = f t * C t - 1 + i t * C ~ t
Wherein, Ct-1It is previous moment cell state, For current time is cellular The reference value of state, WCIt is the weight matrix of cell state, bCIt is the bias vector of cell state, tanh () is tanh letter Number;
B1.3, the output h that each concealed nodes is obtained according to following formulat, by htIt is sequentially connected, constitutes the characteristic vector of m dimensions;
ht=ot*tanh(Ct)
ot=σ (Wo·[ht-1,xt]+bo)
Wherein, WoTo export the weight matrix of gate cell, boTo export the bias vector of gate cell, otTo export the defeated of gate cell Go out.
3. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 2 Method, it is characterised in that the convolutional neural networks module in the step B, specific construction step is as follows:
B2.1, the characteristic vector of the m dimensions that will be extracted in step B1.3 are converted to the eigenmatrix of n × n as convolutional neural networks Input;
B2.2, the ground floor of convolutional neural networks are convolutional layer, from m1Individual k1×k1The convolution kernel of dimension is rolled up to input data Product computing, convolution step-length is s1, the result after convolution carries out Nonlinear Mapping, obtains the output of convolutional layer by excitation function again That is m1Individual l1×l1The characteristic pattern of dimension;
The second layer of B2.3 convolutional neural networks is pond layer, from m2Individual k2×k2The convolution kernel of dimension, it is defeated to ground floor convolutional layer The characteristic pattern for going out carries out step-length for s2Pond, obtain the output i.e. m of pond layer2Individual l2×l2The characteristic pattern of dimension;
B2.4, the third layer of convolutional neural networks are convolutional layer, from m3Individual k3×k3The convolution kernel of dimension, to second layer pond layer The characteristic pattern of output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, obtains convolutional layer Output be m3Individual l3×l3The characteristic pattern of dimension;
B2.5, the 4th layer of convolutional neural networks be convolutional layer, from m4Individual k4×k4The convolution kernel of dimension, to third layer convolutional layer The characteristic pattern of output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, obtains convolutional layer Output be m4Individual l4×l4The characteristic pattern of dimension;
B2.6, the layer 5 of convolutional neural networks are convolutional layer, from m5Individual k5×k5The convolution kernel of dimension, to the 4th layer of convolutional layer The characteristic pattern of output carries out convolution algorithm, and the result after convolution carries out Nonlinear Mapping by excitation function again, obtains convolutional layer Output be m5Individual l5×l5The characteristic pattern of dimension;
B2.7, the layer 6 of convolutional neural networks are pond layer, from m6Individual k6×k6The convolution kernel of dimension, to layer 5 convolutional layer The characteristic pattern of output carries out step-length for s6Pond, obtain the output i.e. m of pond layer6Individual l6×l6The characteristic pattern of dimension;
B2.8, the seven, the eight of convolutional neural networks, nine layer are full articulamentum;Wherein, layer 7 is by the output of layer 6 pond layer Characteristic pattern be connected to the c node of this layer entirely;The 8th layer of c node to layer 7 carries out ReLU function nonlinear transformations Afterwards, the connection weight of node layer is then hidden using the control of dropout methods, full linking number is c;9th layer of full articulamentum Output node is p, is output as having merged the softmax losses of feature tag.
4. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 3 Method, it is characterised in that the function J (θ) of the softmax losses of the convolutional neural networks in the step B is defined as follows:
J ( θ ) = - 1 q [ Σ i = 1 q Σ j = 1 P 1 { y ( i ) = j } l o g e θ j T x ( i ) Σ j = 1 p e θ j T x ( i ) ]
Wherein, x(i)It is input vector, y(i)It is the corresponding emotional category of input vector, i=1,2 ... q, q corresponds to speech samples Number;θjIt is parameter to be trained, j=1,2 ... p, p corresponds to emotional category number, and T is transposition, and e is the nature truth of a matter;1 { } be Indicator function, when braces intermediate value is true, function value is 1, otherwise is 0.
5. a kind of speech emotion recognition based on length time memory network and convolutional neural networks according to claim 3 Method, it is characterised in that tanh function representations areSigmoid function representations areWherein, x is variable.
CN201611093447.3A 2016-12-01 2016-12-01 Speech emotion recognition method based on deep neural network Active CN106782602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611093447.3A CN106782602B (en) 2016-12-01 2016-12-01 Speech emotion recognition method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611093447.3A CN106782602B (en) 2016-12-01 2016-12-01 Speech emotion recognition method based on deep neural network

Publications (2)

Publication Number Publication Date
CN106782602A true CN106782602A (en) 2017-05-31
CN106782602B CN106782602B (en) 2020-03-17

Family

ID=58913860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611093447.3A Active CN106782602B (en) 2016-12-01 2016-12-01 Speech emotion recognition method based on deep neural network

Country Status (1)

Country Link
CN (1) CN106782602B (en)

Cited By (91)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274378A (en) * 2017-07-25 2017-10-20 江西理工大学 A kind of image blurring type identification and parameter tuning method for merging memory CNN
CN107293288A (en) * 2017-06-09 2017-10-24 清华大学 A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network
CN107293290A (en) * 2017-07-31 2017-10-24 郑州云海信息技术有限公司 The method and apparatus for setting up Speech acoustics model
CN107392109A (en) * 2017-06-27 2017-11-24 南京邮电大学 A kind of neonatal pain expression recognition method based on deep neural network
CN107464568A (en) * 2017-09-25 2017-12-12 四川长虹电器股份有限公司 Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system
CN107506414A (en) * 2017-08-11 2017-12-22 武汉大学 A kind of code based on shot and long term memory network recommends method
CN107562792A (en) * 2017-07-31 2018-01-09 同济大学 A kind of question and answer matching process based on deep learning
CN107562784A (en) * 2017-07-25 2018-01-09 同济大学 Short text classification method based on ResLCNN models
CN107633842A (en) * 2017-06-12 2018-01-26 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN107679557A (en) * 2017-09-19 2018-02-09 平安科技(深圳)有限公司 Driving model training method, driver's recognition methods, device, equipment and medium
CN107679199A (en) * 2017-10-11 2018-02-09 北京邮电大学 A kind of external the Chinese text readability analysis method based on depth local feature
CN107703564A (en) * 2017-10-13 2018-02-16 中国科学院深圳先进技术研究院 A kind of precipitation predicting method, system and electronic equipment
CN107785011A (en) * 2017-09-15 2018-03-09 北京理工大学 Word speed estimates training, word speed method of estimation, device, equipment and the medium of model
CN107818307A (en) * 2017-10-31 2018-03-20 天津大学 A kind of multi-tag Video Events detection method based on LSTM networks
CN107862331A (en) * 2017-10-31 2018-03-30 华中科技大学 It is a kind of based on time series and CNN unsafe acts recognition methods and system
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN107992938A (en) * 2017-11-24 2018-05-04 清华大学 Space-time big data Forecasting Methodology and system based on positive and negative convolutional neural networks
CN108039181A (en) * 2017-11-02 2018-05-15 北京捷通华声科技股份有限公司 The emotion information analysis method and device of a kind of voice signal
CN108280406A (en) * 2017-12-30 2018-07-13 广州海昇计算机科技有限公司 A kind of Activity recognition method, system and device based on segmentation double-stream digestion
CN108304823A (en) * 2018-02-24 2018-07-20 重庆邮电大学 A kind of expression recognition method based on two-fold product CNN and long memory network in short-term
CN108346436A (en) * 2017-08-22 2018-07-31 腾讯科技(深圳)有限公司 Speech emotional detection method, device, computer equipment and storage medium
CN108520753A (en) * 2018-02-26 2018-09-11 南京工程学院 Voice lie detection method based on the two-way length of convolution memory network in short-term
CN108550375A (en) * 2018-03-14 2018-09-18 鲁东大学 A kind of emotion identification method, device and computer equipment based on voice signal
CN108564942A (en) * 2018-04-04 2018-09-21 南京师范大学 One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN108597539A (en) * 2018-02-09 2018-09-28 桂林电子科技大学 Speech-emotion recognition method based on parameter migration and sound spectrograph
CN108630199A (en) * 2018-06-30 2018-10-09 中国人民解放军战略支援部队信息工程大学 A kind of data processing method of acoustic model
CN108717856A (en) * 2018-06-16 2018-10-30 台州学院 A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network
CN108766419A (en) * 2018-05-04 2018-11-06 华南理工大学 A kind of abnormal speech detection method based on deep learning
CN108764268A (en) * 2018-04-02 2018-11-06 华南理工大学 A kind of multi-modal emotion identification method of picture and text based on deep learning
CN108806667A (en) * 2018-05-29 2018-11-13 重庆大学 The method for synchronously recognizing of voice and mood based on neural network
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN108831450A (en) * 2018-03-30 2018-11-16 杭州鸟瞰智能科技股份有限公司 A kind of virtual robot man-machine interaction method based on user emotion identification
CN108922617A (en) * 2018-06-26 2018-11-30 电子科技大学 A kind of self-closing disease aided diagnosis method neural network based
WO2018227169A1 (en) * 2017-06-08 2018-12-13 Newvoicemedia Us Inc. Optimal human-machine conversations using emotion-enhanced natural speech
CN109003625A (en) * 2018-07-27 2018-12-14 中国科学院自动化研究所 Speech-emotion recognition method and system based on ternary loss
CN109034034A (en) * 2018-07-12 2018-12-18 广州麦仑信息科技有限公司 A kind of vein identification method based on nitrification enhancement optimization convolutional neural networks
CN109036467A (en) * 2018-10-26 2018-12-18 南京邮电大学 CFFD extracting method, speech-emotion recognition method and system based on TF-LSTM
CN109036459A (en) * 2018-08-22 2018-12-18 百度在线网络技术(北京)有限公司 Sound end detecting method, device, computer equipment, computer storage medium
CN109087635A (en) * 2018-08-30 2018-12-25 湖北工业大学 A kind of speech-sound intelligent classification method and system
CN109147826A (en) * 2018-08-22 2019-01-04 平安科技(深圳)有限公司 Music emotion recognition method, device, computer equipment and computer storage medium
CN109146066A (en) * 2018-11-01 2019-01-04 重庆邮电大学 A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition
CN109190514A (en) * 2018-08-14 2019-01-11 电子科技大学 Face character recognition methods and system based on two-way shot and long term memory network
CN109243493A (en) * 2018-10-30 2019-01-18 南京工程学院 Based on the vagitus emotion identification method for improving long memory network in short-term
CN109243494A (en) * 2018-10-30 2019-01-18 南京工程学院 Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism
CN109243490A (en) * 2018-10-11 2019-01-18 平安科技(深圳)有限公司 Driver's Emotion identification method and terminal device
CN109285562A (en) * 2018-09-28 2019-01-29 东南大学 Speech-emotion recognition method based on attention mechanism
CN109282837A (en) * 2018-10-24 2019-01-29 福州大学 Bragg grating based on LSTM network interlocks the demodulation method of spectrum
CN109346107A (en) * 2018-10-10 2019-02-15 中山大学 A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved
CN109389992A (en) * 2018-10-18 2019-02-26 天津大学 A kind of speech-emotion recognition method based on amplitude and phase information
WO2019037382A1 (en) * 2017-08-24 2019-02-28 平安科技(深圳)有限公司 Emotion recognition-based voice quality inspection method and device, equipment and storage medium
CN109426858A (en) * 2017-08-29 2019-03-05 京东方科技集团股份有限公司 Neural network, training method, image processing method and image processing apparatus
CN109567793A (en) * 2018-11-16 2019-04-05 西北工业大学 A kind of ECG signal processing method towards cardiac arrhythmia classification
CN109637545A (en) * 2019-01-17 2019-04-16 哈尔滨工程大学 Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term
CN109754790A (en) * 2017-11-01 2019-05-14 中国科学院声学研究所 A kind of speech recognition system and method based on mixing acoustic model
CN110096587A (en) * 2019-01-11 2019-08-06 杭州电子科技大学 The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism
CN110179453A (en) * 2018-06-01 2019-08-30 山东省计算中心(国家超级计算济南中心) Electrocardiogram classification method based on convolutional neural networks and shot and long term memory network
WO2019179036A1 (en) * 2018-03-19 2019-09-26 平安科技(深圳)有限公司 Deep neural network model, electronic device, identity authentication method, and storage medium
CN110322900A (en) * 2019-06-25 2019-10-11 深圳市壹鸽科技有限公司 A kind of method of phonic signal character fusion
CN110363751A (en) * 2019-07-01 2019-10-22 浙江大学 A kind of big enteroscope polyp detection method based on generation collaborative network
WO2019232883A1 (en) * 2018-06-07 2019-12-12 平安科技(深圳)有限公司 Insurance product pushing method and device, computer device and storage medium
CN110600018A (en) * 2019-09-05 2019-12-20 腾讯科技(深圳)有限公司 Voice recognition method and device and neural network training method and device
CN110738852A (en) * 2019-10-23 2020-01-31 浙江大学 intersection steering overflow detection method based on vehicle track and long and short memory neural network
CN110929762A (en) * 2019-10-30 2020-03-27 中国科学院自动化研究所南京人工智能芯片创新研究院 Method and system for detecting body language and analyzing behavior based on deep learning
CN110956953A (en) * 2019-11-29 2020-04-03 中山大学 Quarrel identification method based on audio analysis and deep learning
CN111028859A (en) * 2019-12-15 2020-04-17 中北大学 Hybrid neural network vehicle type identification method based on audio feature fusion
CN111179910A (en) * 2019-12-17 2020-05-19 深圳追一科技有限公司 Speed of speech recognition method and apparatus, server, computer readable storage medium
CN111210844A (en) * 2020-02-03 2020-05-29 北京达佳互联信息技术有限公司 Method, device and equipment for determining speech emotion recognition model and storage medium
CN111222624A (en) * 2018-11-26 2020-06-02 深圳云天励飞技术有限公司 Parallel computing method and device
CN111310672A (en) * 2020-02-19 2020-06-19 广州数锐智能科技有限公司 Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling
CN111326178A (en) * 2020-02-27 2020-06-23 长沙理工大学 Multi-mode speech emotion recognition system and method based on convolutional neural network
CN111524535A (en) * 2020-04-30 2020-08-11 杭州电子科技大学 Feature fusion method for speech emotion recognition based on attention mechanism
JP2020134719A (en) * 2019-02-20 2020-08-31 ソフトバンク株式会社 Translation device, translation method, and translation program
CN111709284A (en) * 2020-05-07 2020-09-25 西安理工大学 Dance emotion recognition method based on CNN-LSTM
CN111883252A (en) * 2020-07-29 2020-11-03 济南浪潮高新科技投资发展有限公司 Auxiliary diagnosis method, device, equipment and storage medium for infantile autism
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112037822A (en) * 2020-07-30 2020-12-04 华南师范大学 Voice emotion recognition method based on ICNN and Bi-LSTM
CN112101095A (en) * 2020-08-02 2020-12-18 华南理工大学 Suicide and violence tendency emotion recognition method based on language and limb characteristics
CN112187413A (en) * 2020-08-28 2021-01-05 中国人民解放军海军航空大学航空作战勤务学院 SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology)
CN112259126A (en) * 2020-09-24 2021-01-22 广州大学 Robot and method for assisting in recognizing autism voice features
CN112383369A (en) * 2020-07-23 2021-02-19 哈尔滨工业大学 Cognitive radio multi-channel spectrum sensing method based on CNN-LSTM network model
CN112446266A (en) * 2019-09-04 2021-03-05 北京君正集成电路股份有限公司 Face recognition network structure suitable for front end
CN112735434A (en) * 2020-12-09 2021-04-30 中国人民解放军陆军工程大学 Voice communication method and system with voiceprint cloning function
CN112735479A (en) * 2021-03-31 2021-04-30 南方电网数字电网研究院有限公司 Speech emotion recognition method and device, computer equipment and storage medium
CN112766292A (en) * 2019-11-04 2021-05-07 中移(上海)信息通信科技有限公司 Identity authentication method, device, equipment and storage medium
CN112819133A (en) * 2019-11-15 2021-05-18 北方工业大学 Construction method of deep hybrid neural network emotion recognition model
WO2021127982A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech emotion recognition method, smart device, and computer-readable storage medium
WO2021147363A1 (en) * 2020-01-20 2021-07-29 中国电子科技集团公司电子科学研究院 Text-based major depressive disorder recognition method
CN113221758A (en) * 2021-05-16 2021-08-06 西北工业大学 Underwater acoustic target identification method based on GRU-NIN model
CN113255800A (en) * 2021-06-02 2021-08-13 中国科学院自动化研究所 Robust emotion modeling system based on audio and video
WO2021164346A1 (en) * 2020-02-21 2021-08-26 乐普(北京)医疗器械股份有限公司 Method and device for predicting blood pressure
CN114305418A (en) * 2021-12-16 2022-04-12 广东工业大学 Data acquisition system and method for depression state intelligent evaluation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279495A (en) * 2015-10-23 2016-01-27 天津大学 Video description method based on deep learning and text summarization
CN105469065A (en) * 2015-12-07 2016-04-06 中国科学院自动化研究所 Recurrent neural network-based discrete emotion recognition method
US20160099010A1 (en) * 2014-10-03 2016-04-07 Google Inc. Convolutional, long short-term memory, fully connected deep neural networks
CN105844239A (en) * 2016-03-23 2016-08-10 北京邮电大学 Method for detecting riot and terror videos based on CNN and LSTM
CN106096568A (en) * 2016-06-21 2016-11-09 同济大学 A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160099010A1 (en) * 2014-10-03 2016-04-07 Google Inc. Convolutional, long short-term memory, fully connected deep neural networks
CN105279495A (en) * 2015-10-23 2016-01-27 天津大学 Video description method based on deep learning and text summarization
CN105469065A (en) * 2015-12-07 2016-04-06 中国科学院自动化研究所 Recurrent neural network-based discrete emotion recognition method
CN105844239A (en) * 2016-03-23 2016-08-10 北京邮电大学 Method for detecting riot and terror videos based on CNN and LSTM
CN106096568A (en) * 2016-06-21 2016-11-09 同济大学 A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PENG CHEN ET AL: "Clause Sentiment Identification Basedon Convolutional Neural Network With Context Embedding", 《2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY》 *

Cited By (137)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018227169A1 (en) * 2017-06-08 2018-12-13 Newvoicemedia Us Inc. Optimal human-machine conversations using emotion-enhanced natural speech
CN107293288A (en) * 2017-06-09 2017-10-24 清华大学 A kind of residual error shot and long term remembers the acoustic model modeling method of Recognition with Recurrent Neural Network
CN107293288B (en) * 2017-06-09 2020-04-21 清华大学 Acoustic model modeling method of residual long-short term memory recurrent neural network
CN107633842B (en) * 2017-06-12 2018-08-31 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN107633842A (en) * 2017-06-12 2018-01-26 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN107392109A (en) * 2017-06-27 2017-11-24 南京邮电大学 A kind of neonatal pain expression recognition method based on deep neural network
CN107274378A (en) * 2017-07-25 2017-10-20 江西理工大学 A kind of image blurring type identification and parameter tuning method for merging memory CNN
CN107274378B (en) * 2017-07-25 2020-04-03 江西理工大学 Image fuzzy type identification and parameter setting method based on fusion memory CNN
CN107562784A (en) * 2017-07-25 2018-01-09 同济大学 Short text classification method based on ResLCNN models
CN107562792A (en) * 2017-07-31 2018-01-09 同济大学 A kind of question and answer matching process based on deep learning
CN107293290A (en) * 2017-07-31 2017-10-24 郑州云海信息技术有限公司 The method and apparatus for setting up Speech acoustics model
CN107562792B (en) * 2017-07-31 2020-01-31 同济大学 question-answer matching method based on deep learning
CN107506414A (en) * 2017-08-11 2017-12-22 武汉大学 A kind of code based on shot and long term memory network recommends method
CN107506414B (en) * 2017-08-11 2020-01-07 武汉大学 Code recommendation method based on long-term and short-term memory network
US11922969B2 (en) 2017-08-22 2024-03-05 Tencent Technology (Shenzhen) Company Limited Speech emotion detection method and apparatus, computer device, and storage medium
WO2019037700A1 (en) * 2017-08-22 2019-02-28 腾讯科技(深圳)有限公司 Speech emotion detection method and apparatus, computer device, and storage medium
CN108346436A (en) * 2017-08-22 2018-07-31 腾讯科技(深圳)有限公司 Speech emotional detection method, device, computer equipment and storage medium
US11189302B2 (en) 2017-08-22 2021-11-30 Tencent Technology (Shenzhen) Company Limited Speech emotion detection method and apparatus, computer device, and storage medium
WO2019037382A1 (en) * 2017-08-24 2019-02-28 平安科技(深圳)有限公司 Emotion recognition-based voice quality inspection method and device, equipment and storage medium
CN109426858A (en) * 2017-08-29 2019-03-05 京东方科技集团股份有限公司 Neural network, training method, image processing method and image processing apparatus
CN109426858B (en) * 2017-08-29 2021-04-06 京东方科技集团股份有限公司 Neural network, training method, image processing method, and image processing apparatus
CN107785011A (en) * 2017-09-15 2018-03-09 北京理工大学 Word speed estimates training, word speed method of estimation, device, equipment and the medium of model
CN107785011B (en) * 2017-09-15 2020-07-03 北京理工大学 Training method, device, equipment and medium of speech rate estimation model and speech rate estimation method, device and equipment
CN107679557A (en) * 2017-09-19 2018-02-09 平安科技(深圳)有限公司 Driving model training method, driver's recognition methods, device, equipment and medium
CN107464568A (en) * 2017-09-25 2017-12-12 四川长虹电器股份有限公司 Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system
CN107464568B (en) * 2017-09-25 2020-06-30 四川长虹电器股份有限公司 Speaker identification method and system based on three-dimensional convolution neural network text independence
CN107679199A (en) * 2017-10-11 2018-02-09 北京邮电大学 A kind of external the Chinese text readability analysis method based on depth local feature
CN107703564A (en) * 2017-10-13 2018-02-16 中国科学院深圳先进技术研究院 A kind of precipitation predicting method, system and electronic equipment
CN107703564B (en) * 2017-10-13 2020-04-14 中国科学院深圳先进技术研究院 Rainfall prediction method and system and electronic equipment
CN107818307A (en) * 2017-10-31 2018-03-20 天津大学 A kind of multi-tag Video Events detection method based on LSTM networks
CN107862331A (en) * 2017-10-31 2018-03-30 华中科技大学 It is a kind of based on time series and CNN unsafe acts recognition methods and system
CN107818307B (en) * 2017-10-31 2021-05-18 天津大学 Multi-label video event detection method based on LSTM network
CN109754790B (en) * 2017-11-01 2020-11-06 中国科学院声学研究所 Speech recognition system and method based on hybrid acoustic model
CN109754790A (en) * 2017-11-01 2019-05-14 中国科学院声学研究所 A kind of speech recognition system and method based on mixing acoustic model
CN108039181A (en) * 2017-11-02 2018-05-15 北京捷通华声科技股份有限公司 The emotion information analysis method and device of a kind of voice signal
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN107992938A (en) * 2017-11-24 2018-05-04 清华大学 Space-time big data Forecasting Methodology and system based on positive and negative convolutional neural networks
CN107992938B (en) * 2017-11-24 2019-05-14 清华大学 Space-time big data prediction technique and system based on positive and negative convolutional neural networks
CN108280406A (en) * 2017-12-30 2018-07-13 广州海昇计算机科技有限公司 A kind of Activity recognition method, system and device based on segmentation double-stream digestion
CN108597539A (en) * 2018-02-09 2018-09-28 桂林电子科技大学 Speech-emotion recognition method based on parameter migration and sound spectrograph
CN108597539B (en) * 2018-02-09 2021-09-03 桂林电子科技大学 Speech emotion recognition method based on parameter migration and spectrogram
CN108304823A (en) * 2018-02-24 2018-07-20 重庆邮电大学 A kind of expression recognition method based on two-fold product CNN and long memory network in short-term
CN108304823B (en) * 2018-02-24 2022-03-22 重庆邮电大学 Expression recognition method based on double-convolution CNN and long-and-short-term memory network
CN108520753A (en) * 2018-02-26 2018-09-11 南京工程学院 Voice lie detection method based on the two-way length of convolution memory network in short-term
CN108520753B (en) * 2018-02-26 2020-07-24 南京工程学院 Voice lie detection method based on convolution bidirectional long-time and short-time memory network
CN108550375A (en) * 2018-03-14 2018-09-18 鲁东大学 A kind of emotion identification method, device and computer equipment based on voice signal
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
WO2019179036A1 (en) * 2018-03-19 2019-09-26 平安科技(深圳)有限公司 Deep neural network model, electronic device, identity authentication method, and storage medium
CN108831450A (en) * 2018-03-30 2018-11-16 杭州鸟瞰智能科技股份有限公司 A kind of virtual robot man-machine interaction method based on user emotion identification
CN108764268A (en) * 2018-04-02 2018-11-06 华南理工大学 A kind of multi-modal emotion identification method of picture and text based on deep learning
CN108564942A (en) * 2018-04-04 2018-09-21 南京师范大学 One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN108564942B (en) * 2018-04-04 2021-01-26 南京师范大学 Voice emotion recognition method and system based on adjustable sensitivity
CN108766419A (en) * 2018-05-04 2018-11-06 华南理工大学 A kind of abnormal speech detection method based on deep learning
CN108766419B (en) * 2018-05-04 2020-10-27 华南理工大学 Abnormal voice distinguishing method based on deep learning
CN108806667A (en) * 2018-05-29 2018-11-13 重庆大学 The method for synchronously recognizing of voice and mood based on neural network
CN110179453A (en) * 2018-06-01 2019-08-30 山东省计算中心(国家超级计算济南中心) Electrocardiogram classification method based on convolutional neural networks and shot and long term memory network
CN110179453B (en) * 2018-06-01 2020-01-03 山东省计算中心(国家超级计算济南中心) Electrocardiogram classification method based on convolutional neural network and long-short term memory network
WO2019232883A1 (en) * 2018-06-07 2019-12-12 平安科技(深圳)有限公司 Insurance product pushing method and device, computer device and storage medium
CN108717856B (en) * 2018-06-16 2022-03-08 台州学院 Speech emotion recognition method based on multi-scale deep convolution cyclic neural network
CN108717856A (en) * 2018-06-16 2018-10-30 台州学院 A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network
CN108922617A (en) * 2018-06-26 2018-11-30 电子科技大学 A kind of self-closing disease aided diagnosis method neural network based
CN108630199A (en) * 2018-06-30 2018-10-09 中国人民解放军战略支援部队信息工程大学 A kind of data processing method of acoustic model
CN109034034A (en) * 2018-07-12 2018-12-18 广州麦仑信息科技有限公司 A kind of vein identification method based on nitrification enhancement optimization convolutional neural networks
CN109003625A (en) * 2018-07-27 2018-12-14 中国科学院自动化研究所 Speech-emotion recognition method and system based on ternary loss
CN109190514A (en) * 2018-08-14 2019-01-11 电子科技大学 Face character recognition methods and system based on two-way shot and long term memory network
CN109190514B (en) * 2018-08-14 2021-10-01 电子科技大学 Face attribute recognition method and system based on bidirectional long-short term memory network
CN109147826A (en) * 2018-08-22 2019-01-04 平安科技(深圳)有限公司 Music emotion recognition method, device, computer equipment and computer storage medium
CN109036459A (en) * 2018-08-22 2018-12-18 百度在线网络技术(北京)有限公司 Sound end detecting method, device, computer equipment, computer storage medium
CN109147826B (en) * 2018-08-22 2022-12-27 平安科技(深圳)有限公司 Music emotion recognition method and device, computer equipment and computer storage medium
CN109087635A (en) * 2018-08-30 2018-12-25 湖北工业大学 A kind of speech-sound intelligent classification method and system
CN109285562B (en) * 2018-09-28 2022-09-23 东南大学 Voice emotion recognition method based on attention mechanism
CN109285562A (en) * 2018-09-28 2019-01-29 东南大学 Speech-emotion recognition method based on attention mechanism
CN109346107A (en) * 2018-10-10 2019-02-15 中山大学 A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved
CN109346107B (en) * 2018-10-10 2022-09-30 中山大学 LSTM-based method for inversely solving pronunciation of independent speaker
CN109243490A (en) * 2018-10-11 2019-01-18 平安科技(深圳)有限公司 Driver's Emotion identification method and terminal device
CN109389992A (en) * 2018-10-18 2019-02-26 天津大学 A kind of speech-emotion recognition method based on amplitude and phase information
CN109282837A (en) * 2018-10-24 2019-01-29 福州大学 Bragg grating based on LSTM network interlocks the demodulation method of spectrum
CN109036467A (en) * 2018-10-26 2018-12-18 南京邮电大学 CFFD extracting method, speech-emotion recognition method and system based on TF-LSTM
CN109243494A (en) * 2018-10-30 2019-01-18 南京工程学院 Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism
CN109243494B (en) * 2018-10-30 2022-10-11 南京工程学院 Children emotion recognition method based on multi-attention mechanism long-time memory network
CN109243493B (en) * 2018-10-30 2022-09-16 南京工程学院 Infant crying emotion recognition method based on improved long-time and short-time memory network
CN109243493A (en) * 2018-10-30 2019-01-18 南京工程学院 Based on the vagitus emotion identification method for improving long memory network in short-term
CN109146066A (en) * 2018-11-01 2019-01-04 重庆邮电大学 A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition
CN109567793A (en) * 2018-11-16 2019-04-05 西北工业大学 A kind of ECG signal processing method towards cardiac arrhythmia classification
CN109567793B (en) * 2018-11-16 2021-11-23 西北工业大学 Arrhythmia classification-oriented ECG signal processing method
CN111222624A (en) * 2018-11-26 2020-06-02 深圳云天励飞技术有限公司 Parallel computing method and device
CN111222624B (en) * 2018-11-26 2022-04-29 深圳云天励飞技术股份有限公司 Parallel computing method and device
CN110096587B (en) * 2019-01-11 2020-07-07 杭州电子科技大学 Attention mechanism-based LSTM-CNN word embedded fine-grained emotion classification model
CN110096587A (en) * 2019-01-11 2019-08-06 杭州电子科技大学 The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism
CN109637545B (en) * 2019-01-17 2023-05-30 哈尔滨工程大学 Voiceprint recognition method based on one-dimensional convolution asymmetric bidirectional long-short-time memory network
CN109637545A (en) * 2019-01-17 2019-04-16 哈尔滨工程大学 Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term
JP2020134719A (en) * 2019-02-20 2020-08-31 ソフトバンク株式会社 Translation device, translation method, and translation program
CN110322900A (en) * 2019-06-25 2019-10-11 深圳市壹鸽科技有限公司 A kind of method of phonic signal character fusion
CN110363751A (en) * 2019-07-01 2019-10-22 浙江大学 A kind of big enteroscope polyp detection method based on generation collaborative network
CN112446266B (en) * 2019-09-04 2024-03-29 北京君正集成电路股份有限公司 Face recognition network structure suitable for front end
CN112446266A (en) * 2019-09-04 2021-03-05 北京君正集成电路股份有限公司 Face recognition network structure suitable for front end
CN110600018A (en) * 2019-09-05 2019-12-20 腾讯科技(深圳)有限公司 Voice recognition method and device and neural network training method and device
CN110738852B (en) * 2019-10-23 2020-12-18 浙江大学 Intersection steering overflow detection method based on vehicle track and long and short memory neural network
CN110738852A (en) * 2019-10-23 2020-01-31 浙江大学 intersection steering overflow detection method based on vehicle track and long and short memory neural network
CN110929762A (en) * 2019-10-30 2020-03-27 中国科学院自动化研究所南京人工智能芯片创新研究院 Method and system for detecting body language and analyzing behavior based on deep learning
CN110929762B (en) * 2019-10-30 2023-05-12 中科南京人工智能创新研究院 Limb language detection and behavior analysis method and system based on deep learning
CN112766292A (en) * 2019-11-04 2021-05-07 中移(上海)信息通信科技有限公司 Identity authentication method, device, equipment and storage medium
CN112766292B (en) * 2019-11-04 2024-07-26 中移(上海)信息通信科技有限公司 Identity authentication method, device, equipment and storage medium
CN112819133A (en) * 2019-11-15 2021-05-18 北方工业大学 Construction method of deep hybrid neural network emotion recognition model
CN110956953A (en) * 2019-11-29 2020-04-03 中山大学 Quarrel identification method based on audio analysis and deep learning
CN110956953B (en) * 2019-11-29 2023-03-10 中山大学 Quarrel recognition method based on audio analysis and deep learning
CN111028859A (en) * 2019-12-15 2020-04-17 中北大学 Hybrid neural network vehicle type identification method based on audio feature fusion
CN111179910A (en) * 2019-12-17 2020-05-19 深圳追一科技有限公司 Speed of speech recognition method and apparatus, server, computer readable storage medium
WO2021127982A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech emotion recognition method, smart device, and computer-readable storage medium
WO2021147363A1 (en) * 2020-01-20 2021-07-29 中国电子科技集团公司电子科学研究院 Text-based major depressive disorder recognition method
CN111210844A (en) * 2020-02-03 2020-05-29 北京达佳互联信息技术有限公司 Method, device and equipment for determining speech emotion recognition model and storage medium
CN111310672A (en) * 2020-02-19 2020-06-19 广州数锐智能科技有限公司 Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling
WO2021164346A1 (en) * 2020-02-21 2021-08-26 乐普(北京)医疗器械股份有限公司 Method and device for predicting blood pressure
CN111326178A (en) * 2020-02-27 2020-06-23 长沙理工大学 Multi-mode speech emotion recognition system and method based on convolutional neural network
CN111524535B (en) * 2020-04-30 2022-06-21 杭州电子科技大学 Feature fusion method for speech emotion recognition based on attention mechanism
CN111524535A (en) * 2020-04-30 2020-08-11 杭州电子科技大学 Feature fusion method for speech emotion recognition based on attention mechanism
CN111709284A (en) * 2020-05-07 2020-09-25 西安理工大学 Dance emotion recognition method based on CNN-LSTM
CN112383369A (en) * 2020-07-23 2021-02-19 哈尔滨工业大学 Cognitive radio multi-channel spectrum sensing method based on CNN-LSTM network model
CN111883252A (en) * 2020-07-29 2020-11-03 济南浪潮高新科技投资发展有限公司 Auxiliary diagnosis method, device, equipment and storage medium for infantile autism
CN112037822B (en) * 2020-07-30 2022-09-27 华南师范大学 Voice emotion recognition method based on ICNN and Bi-LSTM
CN112037822A (en) * 2020-07-30 2020-12-04 华南师范大学 Voice emotion recognition method based on ICNN and Bi-LSTM
CN112101095A (en) * 2020-08-02 2020-12-18 华南理工大学 Suicide and violence tendency emotion recognition method based on language and limb characteristics
CN112101095B (en) * 2020-08-02 2023-08-29 华南理工大学 Suicide and violence tendency emotion recognition method based on language and limb characteristics
CN112001482A (en) * 2020-08-14 2020-11-27 佳都新太科技股份有限公司 Vibration prediction and model training method and device, computer equipment and storage medium
CN112001482B (en) * 2020-08-14 2024-05-24 佳都科技集团股份有限公司 Vibration prediction and model training method, device, computer equipment and storage medium
CN112187413A (en) * 2020-08-28 2021-01-05 中国人民解放军海军航空大学航空作战勤务学院 SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology)
CN112187413B (en) * 2020-08-28 2022-05-03 中国人民解放军海军航空大学航空作战勤务学院 SFBC (Small form-factor Block code) identifying method and device based on CNN-LSTM (convolutional neural network-Link State transition technology)
CN112259126A (en) * 2020-09-24 2021-01-22 广州大学 Robot and method for assisting in recognizing autism voice features
CN112735434A (en) * 2020-12-09 2021-04-30 中国人民解放军陆军工程大学 Voice communication method and system with voiceprint cloning function
CN112735479A (en) * 2021-03-31 2021-04-30 南方电网数字电网研究院有限公司 Speech emotion recognition method and device, computer equipment and storage medium
CN112735479B (en) * 2021-03-31 2021-07-06 南方电网数字电网研究院有限公司 Speech emotion recognition method and device, computer equipment and storage medium
CN113221758A (en) * 2021-05-16 2021-08-06 西北工业大学 Underwater acoustic target identification method based on GRU-NIN model
CN113221758B (en) * 2021-05-16 2023-07-14 西北工业大学 GRU-NIN model-based underwater sound target identification method
CN113255800B (en) * 2021-06-02 2021-10-15 中国科学院自动化研究所 Robust emotion modeling system based on audio and video
CN113255800A (en) * 2021-06-02 2021-08-13 中国科学院自动化研究所 Robust emotion modeling system based on audio and video
CN114305418B (en) * 2021-12-16 2023-08-04 广东工业大学 Data acquisition system and method for intelligent assessment of depression state
CN114305418A (en) * 2021-12-16 2022-04-12 广东工业大学 Data acquisition system and method for depression state intelligent evaluation

Also Published As

Publication number Publication date
CN106782602B (en) 2020-03-17

Similar Documents

Publication Publication Date Title
CN106782602A (en) Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN109036465B (en) Speech emotion recognition method
CN106878677B (en) Student classroom mastery degree evaluation system and method based on multiple sensors
Sun et al. Speech emotion recognition based on DNN-decision tree SVM model
EP4002362A1 (en) Method and apparatus for training speech separation model, storage medium, and computer device
CN108664632B (en) Text emotion classification algorithm based on convolutional neural network and attention mechanism
CN109637522B (en) Speech emotion recognition method for extracting depth space attention features based on spectrogram
Wang et al. Research on Web text classification algorithm based on improved CNN and SVM
CN109146066A (en) A kind of collaborative virtual learning environment natural interactive method based on speech emotion recognition
CN107256393A (en) The feature extraction and state recognition of one-dimensional physiological signal based on deep learning
CN102890930B (en) Speech emotion recognizing method based on hidden Markov model (HMM) / self-organizing feature map neural network (SOFMNN) hybrid model
CN106952649A (en) Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN103544963A (en) Voice emotion recognition method based on core semi-supervised discrimination and analysis
CN107785015A (en) A kind of audio recognition method and device
CN110853656B (en) Audio tampering identification method based on improved neural network
CN107705806A (en) A kind of method for carrying out speech emotion recognition using spectrogram and deep convolutional neural networks
Zhou et al. Deep learning based affective model for speech emotion recognition
CN106897254A (en) A kind of network representation learning method
CN107039036A (en) A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN109558935A (en) Emotion recognition and exchange method and system based on deep learning
CN106339718A (en) Classification method based on neural network and classification device thereof
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
CN113033180B (en) Automatic generation service system for Tibetan reading problem of primary school
CN110046709A (en) A kind of multi-task learning model based on two-way LSTM
Dong et al. Environmental sound classification based on improved compact bilinear attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210023

Applicant after: Nanjing Post & Telecommunication Univ.

Address before: 210003, No. 66, new exemplary Road, Nanjing, Jiangsu

Applicant before: Nanjing Post & Telecommunication Univ.

GR01 Patent grant
GR01 Patent grant