CN109817246A - Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model - Google Patents
Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model Download PDFInfo
- Publication number
- CN109817246A CN109817246A CN201910145605.2A CN201910145605A CN109817246A CN 109817246 A CN109817246 A CN 109817246A CN 201910145605 A CN201910145605 A CN 201910145605A CN 109817246 A CN109817246 A CN 109817246A
- Authority
- CN
- China
- Prior art keywords
- voice messaging
- emotion
- frequency
- mel
- emotion recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 94
- 238000000034 method Methods 0.000 title claims abstract description 88
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 84
- 230000008451 emotion Effects 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 74
- 230000003595 spectral effect Effects 0.000 claims abstract description 53
- 238000012545 processing Methods 0.000 claims abstract description 51
- 238000013528 artificial neural network Methods 0.000 claims abstract description 50
- 230000000306 recurrent effect Effects 0.000 claims abstract description 49
- 230000007246 mechanism Effects 0.000 claims abstract description 40
- 238000001228 spectrum Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 13
- 230000015654 memory Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000002996 emotional effect Effects 0.000 claims description 8
- 238000009432 framing Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000003014 reinforcing effect Effects 0.000 claims description 3
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 238000013135 deep learning Methods 0.000 abstract 1
- 230000000875 corresponding effect Effects 0.000 description 96
- 238000010586 diagram Methods 0.000 description 15
- 230000036651 mood Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 10
- 230000007935 neutral effect Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 5
- 230000011218 segmentation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- User Interface Of Digital Computer (AREA)
- Image Analysis (AREA)
Abstract
This application involves intelligent decision fields, based on deep learning training emotion recognition model.Training method, emotion identification method, device, computer equipment and the storage medium of a kind of emotion recognition model are specifically disclosed, this method comprises: obtaining the voice messaging and corresponding data label of user;Sample data is constructed according to voice messaging and corresponding data label;The voice messaging in sample data is pre-processed according to default processing rule to obtain corresponding spectral vectors;Preset Recognition with Recurrent Neural Network is extracted, Recognition with Recurrent Neural Network includes attention mechanism, and attention mechanism is used to reinforce the partial region in voice messaging;Based on Recognition with Recurrent Neural Network, model training is carried out to obtain emotion recognition model according to the corresponding spectral vectors of voice messaging and data label.This method can be improved emotion recognition model can generalization, improve model identification accuracy rate.
Description
Technical field
This application involves model training technical fields more particularly to a kind of training method of emotion recognition model, emotion to know
Other method, apparatus, computer equipment and storage medium.
Background technique
In recent years, extensive hair has been obtained using the emotion recognition model of voice recognition user emotion based on machine learning
Exhibition, but many challenges have been also faced with for the emotion recognition of sound, such as in order to generate the knowledge of lasting accurate positive negative affect
Not, in such a way that text and acoustic feature combine, this mode needs to utilize speech recognition part identification model
Sound is converted text information by (Automatic Speech Recognition, ASR) technology, serious but there are retardances
The problem of.Meanwhile there is also the problems of generalization difference for emotion recognition model, it is quasi- when model is applied to new speaker
True rate can reduce.
Summary of the invention
This application provides a kind of training method of emotion recognition model, emotion identification method, device, computer equipment and
Storage medium, with improve emotion recognition model can generalization, improve the accuracy rate of identification.
In a first aspect, this application provides a kind of training methods of emotion recognition model, which comprises
Obtain the voice messaging and the corresponding data label of the voice messaging of user;
Sample data is constructed according to the voice messaging and corresponding data label;
The voice messaging in the sample data is pre-processed according to default processing rule to obtain corresponding frequency spectrum
Vector;
Preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, the attention mechanism
For reinforcing the partial region in the voice messaging;
Based on the Recognition with Recurrent Neural Network, model is carried out according to the corresponding spectral vectors of the voice messaging and data label
Training is to obtain emotion recognition model.
Second aspect, present invention also provides a kind of emotion identification methods, which comprises
Acquire the voice signal of user;
The voice signal is pre-processed according to default processing rule to obtain the corresponding frequency spectrum of the voice signal
Vector;
The spectral vectors are input to emotion recognition model to identify the emotion of the user, to obtain the use
The emotional category at family, the emotion recognition model are the model obtained using above-mentioned emotion recognition model training method training.
The third aspect, present invention also provides a kind of training device of emotion recognition model, described device includes:
Acquiring unit, for obtaining the voice messaging and the corresponding data label of the voice messaging of user;
Sample construction unit, for constructing sample data according to the voice messaging and corresponding data label;
Pretreatment unit, for according to default processing rule to the voice messaging in the sample data pre-processed with
Obtain corresponding spectral vectors;
Extraction unit, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention mechanism, institute
Attention mechanism is stated for reinforcing the partial region in the voice messaging;
Model training unit, for being based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of the voice messaging
Model training is carried out with data label to obtain emotion recognition model.
The third aspect, present invention also provides a kind of emotion recognition device, described device includes:
Signal acquisition unit, for acquiring the voice signal of user;
Signal processing unit, for being pre-processed the predicate to obtain to the voice signal according to default processing rule
The corresponding spectral vectors of sound signal;
Emotion recognition unit carries out the emotion of the user for the spectral vectors to be input to emotion recognition model
Identification, to obtain the emotional category of the user, the emotion recognition model is using above-mentioned emotion recognition model training side
The model that method training obtains.
Fourth aspect, present invention also provides a kind of computer equipment, the computer equipment includes memory and processing
Device;The memory is for storing computer program;The processor, for executing the computer program and described in the execution
Training method or the emotion identification method such as above-mentioned emotion recognition model are realized when computer program.
5th aspect, present invention also provides a kind of computer readable storage medium, the computer readable storage medium
It is stored with computer program, the computer program makes the processor realize such as above-mentioned emotion recognition when being executed by processor
The training method of model or the emotion identification method.
This application discloses training method, device, equipment and the storage medium of a kind of emotion recognition model, this method is being obtained
After getting the voice messaging and corresponding data label of user, according to default processing rule to voice messaging pre-processed with
Corresponding spectral vectors are obtained, then are based on preset Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of voice messaging and data
Label carries out model training to obtain emotion recognition model, wherein the Recognition with Recurrent Neural Network includes attention mechanism, the attention
Power mechanism is used to reinforce the partial region in the voice messaging.The emotion recognition model that this method trains has can generalization
By force, the high accuracy for examination of identification.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of schematic flow diagram of the training method for emotion recognition model that embodiments herein provides;
Fig. 2 is the structural schematic diagram for the Recognition with Recurrent Neural Network that embodiments herein provides;
Fig. 3 is the sub-step schematic flow diagram of the training method of the emotion recognition model in Fig. 1;
Fig. 4 is a kind of schematic flow diagram of the training method for emotion recognition model that embodiments herein provides;
Fig. 5 is a kind of schematic flow diagram for emotion identification method that embodiments herein provides;
Fig. 6 is a kind of schematic block diagram of model training apparatus provided by the embodiments of the present application;
Fig. 7 is the schematic block diagram of another model training apparatus provided by the embodiments of the present application;
Fig. 8 is a kind of schematic block diagram of emotion recognition device provided by the embodiments of the present application;
Fig. 9 is a kind of structural representation block diagram for computer equipment that one embodiment of the application provides.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
Flow chart shown in the drawings only illustrates, it is not necessary to including all content and operation/step, also not
It is that must be executed by described sequence.For example, some operation/steps can also decompose, combine or partially merge, therefore practical
The sequence of execution is possible to change according to the actual situation.
Embodiments herein provides a kind of training method of emotion recognition model, emotion identification method, device, calculating
Machine equipment and storage medium.Wherein, server can be used to be trained for the training method of emotion recognition model;Emotion identification method
Can be applied in terminal or server, go out the affective style of the user for the voice recognition according to user, for example, it is glad or
Sadness etc..
Wherein, server can be independent server, or server cluster.The terminal can be mobile phone, put down
The electronic equipments such as plate computer, laptop, desktop computer, personal digital assistant and wearable device.
With reference to the accompanying drawing, it elaborates to some embodiments of the application.In the absence of conflict, following
Feature in embodiment and embodiment can be combined with each other.
Referring to Fig. 1, Fig. 1 is a kind of signal stream of the training method for emotion recognition model that embodiments herein provides
Cheng Tu.Wherein, which is to carry out model training based on preset Recognition with Recurrent Neural Network to obtain.
As shown in Fig. 2, Fig. 2 is a kind of structural representation for preset Recognition with Recurrent Neural Network that embodiments herein provides
Figure.The structure of the Recognition with Recurrent Neural Network includes input layer, circulation layer, attention mechanism, connects layer and output layer entirely;The attention
Power mechanism is for the mapping relations between the output quantity and weight vectors of the circulation layer according to attention establishing equation to realize
Reinforce the partial region in the voice messaging, and then improves the recognition accuracy of model.
Wherein, circulation layer includes shot and long term memory network (Long Short-Term Memory, LSTM) unit, output layer
It is exported using Softmax.In the structure of Recognition with Recurrent Neural Network, temporal dependence in the corresponding list entries of input layer
It is to be modeled with one includes the circulation layer of shot and long term memory network unit;Attention mechanism is to be applied in the sequence often
It is that some regions in sequence increase more weights, these regions are to know in the output of one time point corresponding circulation layer
Not positive negative-morality when important region.Relative to other Recognition with Recurrent Neural Network (Recurrent Neural Networks,
RNN for), which can be used to learn prolonged dependence, while gradient disappears not yet
Or the problem of gradient explosion, available better recognition effect.
Below with reference to the structure of the Recognition with Recurrent Neural Network in Fig. 2, the emotion recognition of embodiments herein offer is introduced
The training method of model.
As shown in Figure 1, the training method of the emotion recognition model, for training emotion recognition model with quickly and accurately
Identify the affective style of user.Wherein the training method includes step S101 to step S105.
S101, the voice messaging and the corresponding data label of the voice messaging for obtaining user.
Wherein, data label is the affective tag of user, including positive mood label, neutral mood label and negative-morality label
Deng.It is of course also possible to voice messaging is divided into more classes, and then corresponding more data labels, for example, it is glad, sad, fear,
The data labels such as sad or neutral, different data label represent the different moods of user.
Specifically, the voice messaging of user is obtained from presetting database, which includes label data, i.e.,
The corresponding data label of the voice messaging.Before this, further includes: acquire the voice messaging of user and according to data label pair
The voice messaging is marked, and the voice messaging for being marked with data label is stored in the presetting database.With
Family may include the user etc. of the crowds such as user in different crowd, such as child, youth, middle age and old age;It is understood that
It is also possible to crowd of different occupation, such as teacher, student, doctor, lawyer and IT personnel etc., and then abundant sample data is more
Sample.
In one embodiment, in order to improve the recognition accuracy of model, voice messaging is set and is acquired, is i.e. institute
State the voice messaging and the corresponding data label of the voice messaging for obtaining user, comprising: obtain user and tell about different emotions
Corresponding voice messaging and the user carry out the data mark that emotion marking generates to the voice messaging when story of type
Label.
Specifically, acquisition user tells about two passive stories and the corresponding voice letter of two optimistic stories first
Breath;And before saying every story or after telling a story, obtains the user and give a mark according to scoring criterion to its mood
Corresponding marking score;Scoring criterion such as makes 0-5 scores of expression negative-morality, and 6-10 points are positive moods, and raw according to marking score
At corresponding data label;For example marking is 4 points, then the corresponding label data of the voice messaging is negative-morality label.
It is of course also possible to which the user of acquisition is told about two passive stories and the corresponding voice letter of two optimistic stories
Breath carries out segmentation marking, and determines corresponding data label according to the corresponding marking score of segmentation marking, for example, by voice messaging
It is divided into two sections of sound bites, the marking score of first segment sound bite is 0 point, then corresponding data label is negative-morality, second
The marking score of section sound bite is 10 points, then corresponding data label is positive mood.
S102, sample data is constructed according to the voice messaging and corresponding data label.
Specifically, sample data can be constituted according to the voice messaging of acquisition user and corresponding data label.User
For multiple users, particular number does not limit herein, and since the emotion of user is different, which includes positive sample data
With negative sample data, positive sample data correspond to the voice messaging of positive mood, and positive mood is such as optimistic, glad and excitement etc.;It is negative
Sample data corresponds to the voice messaging of negative-morality, and negative-morality is such as the relevant mood such as passive, sad and pain.
The default processing rule of S103, basis pre-processes the voice messaging in the sample data corresponding to obtain
Spectral vectors.
Wherein, the voice messaging in the sample data is for producing the information in frequency domain by the default processing rule,
Specifically such as the voice messaging acquired in the time domain is converted into using Fast Fourier Transform (FFT) rule or wavelet transformation rule
Information in frequency domain.
In one embodiment, in order to accelerate model training and identification precision, using preprocessing rule, such as Fig. 3 institute
Show, i.e. step S103 includes: sub-step S103a to sub-step S103d.
S103a, framing windowing process is carried out to the voice messaging in the sample data, and to obtain that treated, voice is believed
Breath.
Wherein, it is 40ms that frame length, which is specifically arranged, in framing windowing process, is carried out according to the frame length 40ms of setting to voice messaging
Then dividing processing is added hamming window to handle voice messaging after segmentation, is added at hamming window again with the voice messaging after being divided
Reason refers to voice messaging after segmentation multiplied by a window function, in order to carry out Fourier expansion.
It should be noted that framing windowing process, the specific frame length that is arranged can be set to other values, for example, be set as 50ms,
30ms or other values.
In one embodiment, framing windowing process is being carried out to be handled to the voice messaging in the sample data
Before voice messaging afterwards, can also preemphasis processing be carried out to voice messaging, specifically multiplied by one and the frequency of voice messaging
The predetermined coefficient being positively correlated, to promote the amplitude of high frequency, the size of the predetermined coefficient and the parameter of model training are associated, i.e.,
Changed according to the variation of model parameter, for example, with weight vectors aiIt is associated, with specific reference to weight vectors aiCorresponding mean value increases
Increase greatly, or is reduced according to the mean value and reduced.Purpose is preferably to improve the accuracy of identification of model.
In an alternative embodiment, predetermined coefficient can be set to an empirical value, and an empirical value, which is arranged, to be used
In eliminating effect caused by vocal cords and lip in user's voiced process, to compensate the height that voice messaging is constrained by articulatory system
Frequency part, and the formant of high frequency can be highlighted.
S103b, to treated, voice messaging carries out frequency-domain transform to obtain corresponding amplitude spectrum.
Specifically to treated voice messaging carry out Fast Fourier Transform (FFT) (Fast Fourier Transform,
It FFT), is that amplitude is as amplitude spectrum in order to obtain in the present embodiment, i.e., after Fast Fourier Transform (FFT) to obtain corresponding parameter
Amplitude.It is of course also possible to the other parameters after FFT transform, for example amplitude is plus phase information etc..
It is understood that wavelet transformation can also be carried out to treated voice messaging to obtain corresponding parameter, and
Select transformed amplitude as amplitude spectrum.
S103c, the amplitude spectrum is filtered by Meier filter group, and to the amplitude spectrum after filtering processing
Discrete cosine transform is carried out to obtain mel-frequency cepstrum coefficient.
Specifically, described that the amplitude spectrum is filtered by Meier filter group, comprising: to obtain the voice
The corresponding maximum frequency of information calculates the corresponding mel-frequency of the maximum frequency using mel-frequency calculation formula;According to meter
The quantity of the mel-frequency of calculation and the Meier filter group intermediate cam filter calculates in two adjacent triangular filters
The Meier spacing of frequency of heart;The linear distribution to multiple triangular filters is completed according to the Meier spacing;It is linear according to completing
Multiple triangular filters of distribution are filtered the amplitude spectrum.
Meier filter group specifically includes the triangular filter that 40 linear distributions are measured in Meier.Amplitude spectrum will be obtained to lead to
40 linear distributions are crossed after the triangular filter that Meier is measured is filtered, then carries out discrete cosine transform and obtains Meier
Frequency cepstral coefficient.
It determines corresponding maximum frequency in voice messaging, can be calculated most according to maximum frequency using mel-frequency calculation formula
Big mel-frequency calculates two adjacent triangular filters according to quantity (40) of maximum mel-frequency and triangular filter
The spacing of centre frequency;The linear distribution to multiple triangular filters is completed according to the spacing calculated.
Wherein, the mel-frequency calculation formula are as follows:
In formula (1), fmelFor the mel-frequency, f is the corresponding maximum frequency of the voice messaging, and A is coefficient,
Specially 2595.
For example, the maximum frequency determined is 4000Hz, can find out maximum mel-frequency using formula (1) is
2146.1mel。
Since in Meier measure range, the centre frequency of each triangular filter is the linear distribution of equal intervals.By
This, can calculate the spacing of the centre frequency of two adjacent triangular filters are as follows:
Wherein, Δ mel is the spacing of the centre frequency of two adjacent triangular filters;K is the quantity of triangular filter.
S103d, the mel-frequency cepstrum coefficient is normalized to obtain the corresponding frequency of the voice messaging
Compose vector.
Specifically, use zero-mean normalization that the mel-frequency cepstrum coefficient is normalized described to obtain
The corresponding spectral vectors of voice messaging, the zero-mean normalize corresponding conversion formula are as follows:
Wherein,For the mean value of mel-frequency cepstrum coefficient;σ is the standard deviation of mel-frequency cepstrum coefficient;X is each plum
That frequency cepstral coefficient;x*For the mel-frequency cepstrum coefficient after normalization.
Zero-the mean normalization (Z-Score standardization) used, also referred to as standard deviation standardize.Treated data
Mean value be 0, mark difference be 1.Z-Score standardization is uniformly to convert different magnitude of data to the same magnitude, unified
It is measured with calculated Z-Score value, to guarantee the comparativity between data.
S104, preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, the attention
Mechanism is used to reinforce the partial region in the voice messaging.
Wherein, the structure of the Recognition with Recurrent Neural Network includes input layer, circulation layer, attention mechanism, connects layer and output entirely
Layer;The attention mechanism is for the mapping between the output quantity and weight vectors of the circulation layer according to attention establishing equation
Relationship is to realize the partial region reinforced in the voice messaging.
The attention equation are as follows:
Wherein, g is the input vector for connecting layer entirely;hiFor the output quantity of the corresponding circulation layer of each time point i;ai
It is the corresponding weight vectors of each time point i, it is big to the influence for connecting layer and output layer entirely for representing each time point i
It is small.
The key of attention mechanism is study to this equation, and the equation gives each circulation layer in each time point i
Output hiWith a weight vectors aiBetween establish a mapping relations, hiIndicate the output of circulation layer, aiIt is for representing
Influence size of each time point to the layer after in network.
Wherein, f (hi) in parameter in the training process can be optimised, expression formula specifically:
f(hi)=tanh (Whi+b) (4)
In formula (4), W and b are the parameter of linear equation, hiCorresponding is the output of the LSTM layer of each time point i,
It is expressed as hi=(h0,...hT-1), wherein T is the total number at time point in the sequence given for one.It is simple in the present embodiment
The form for its expression formula changed, the specific activation primitive for adding a tanh using a linear function such as formula (4) both can be with
Preferable effect is obtained, while the training speed of model can be improved again.
The time point i, weight vectors a given for oneiFormula are as follows:
In formula (5), W is the matrix parameter of a dimension S*D, and S is positive integer, b and u be a dimension be S to
Parameter is measured, D is the number of network unit in the circulation layer.
It should be noted that g is input of the vector as full articulamentum, activation primitive uses ReLu function, later
Connecting layer uses Softmax function entirely, to obtain output to the end.
S105, be based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of the voice messaging and data label into
Row model training is to obtain emotion recognition model.
Specifically, spectral vectors are input to preset Recognition with Recurrent Neural Network and carry out model training, pass through improved model
In attention mechanism the major part in sound is reinforced, optimize corresponding model parameter and then obtain emotion recognition mould
Type, model training parameter are as shown in table 1.
Table 1 is the relevant parameter of training network
Parameter type | Parameter value |
Optimization algorithm | Adam |
Learning rate | 0.0005 |
LSTM unit number | 128 |
Full articulamentum neuron number | 20 |
The probability that Dropout retains | 0.7 |
Model training method provided by the above embodiment is in the voice messaging and corresponding data label for getting user
Afterwards, voice messaging is pre-processed according to default processing rule to obtain corresponding spectral vectors, then is based on preset circulation
Neural network, according to the corresponding spectral vectors of voice messaging and data label progress model training to obtain emotion recognition model,
Wherein, which includes attention mechanism, and the attention mechanism is used to reinforce the part in the voice messaging
Region.The emotion recognition model that this method trains have can generalization it is strong, the high accuracy for examination of identification.
Referring to Fig. 4, Fig. 4 is the signal of the training method for another emotion recognition model that embodiments herein provides
Flow chart.Wherein, which is to carry out model training based on preset Recognition with Recurrent Neural Network to obtain, and certainly may be used
To be trained to obtain using other networks.
As shown in figure 4, the training method of the emotion recognition model, including step S201 to step S207.
S201, the voice messaging and the corresponding data label of the voice messaging for obtaining user.
Wherein, data label is the affective tag of user, including positive mood label, neutral mood label and negative-morality label
Deng.It is of course also possible to voice messaging is divided into more classes, and then corresponding more data labels, for example, it is glad, sad, fear,
The data labels such as sad or neutral, different data label represent the different moods of user.
S202, sample data is constructed according to the voice messaging and corresponding data label, the sample data is at least
Including positive sample data and negative sample data.
Specifically, sample data can be constituted according to the voice messaging of acquisition user and corresponding data label.Due to
The emotion of user is different, therefore the sample data includes at least positive sample data and negative sample data, for example may also include neutrality
Sample data.Positive sample data correspond to the voice messaging of positive mood;Negative sample data correspond to the voice messaging of negative-morality.
S203, judge whether positive sample data and negative sample data in the sample data reach balance.
Specifically, whether the positive sample data judged in the sample data and negative sample data reach balance, and
Judging result is generated, which includes: positive sample data and negative sample data balancing and positive sample data and negative sample number
According to imbalance.
Wherein, if positive sample data and negative sample data nonbalance, then follow the steps S204;If positive sample data and negative sample
Notebook data balance, thens follow the steps S205.
S204, the sample data is handled according to default data processing rule so that positive sample data and negative
Sample data reaches balance.
If the positive sample data and negative sample data nonbalance, according to default data processing rule to the sample data
It is handled so that the positive sample data and negative sample data reach balance.Specifically, sample can be corresponded to by two ways
Data are handled so that positive sample data and negative sample data reach balance.It is respectively as follows:
One, sample data is handled by way of over-sampling: positive sample data in the sample data of building and
Negative sample data, usually negative sample data are less than positive sample data, specifically by the negative sample data duplication repeatedly and with just
The sample data of sample data composing training.For the sample data that training is used for, due to negative sample number therein
According to replicating several times, constitute new sample data, and then can solve the problem of sample unevenness more.
Two, sample data is handled by the way that Weighted Loss Function is arranged: the intersection entropy function by making a standard
Or the Model Weight θ of the cross entropy function minimization training of weighting is optimal, especially by the thought of weighting, such as negative sample
It is few, know it is negative sample when training, goes to be adjusted model parameter by weight, to increase the influence of negative sample.
Wherein, the corresponding expression formula of cross entropy loss function of standard are as follows:
Wherein,It is the output of the Softmax of each sequence n observed, it is F*D that wherein X, which is dimension,
Matrix, wherein F represent is each time point input spectral coefficient quantity;CnIt is the sequence that each is observed
The label of the corresponding class of n, the value range of label are { 0,1 }, naturally it is also possible to it be { 0,1,2 }, respectively corresponds negative sample, it is neutral
Sample and positive sample.It is of course also possible to use the intersection entropy function of weighting, the cross entropy of the intersection entropy function and standard of the weighting
Loss function is similar, and target is all to solve the problems, such as that sample data is non-uniform.
The default processing rule of S205, basis pre-processes the voice messaging in the sample data corresponding to obtain
Spectral vectors.
Specifically, if the positive sample data and negative sample data reach balance, according to default processing rule to described
Voice messaging in sample data is pre-processed to obtain corresponding spectral vectors.Wherein, the default processing rule for for
Voice messaging in the sample data is produced into the information in frequency domain, specifically such as using Fast Fourier Transform (FFT) rule or
The voice messaging acquired in the time domain is converted into the information in frequency domain by wavelet transformation rule.
S206, preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, the attention
Mechanism is used to reinforce the partial region in the voice messaging.
Wherein, the structure of the Recognition with Recurrent Neural Network includes input layer, circulation layer, attention mechanism, connects layer and output entirely
Layer;The attention mechanism is for the mapping between the output quantity and weight vectors of the circulation layer according to attention establishing equation
Relationship is to realize the partial region reinforced in the voice messaging.
S207, be based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of the voice messaging and data label into
Row model training is to obtain emotion recognition model.
Specifically, spectral vectors are input to preset Recognition with Recurrent Neural Network and carry out model training, pass through improved model
In attention mechanism the major part in sound is reinforced, optimize corresponding model parameter and then obtain emotion recognition mould
Type.
Model training method provided by the above embodiment is in the voice messaging and corresponding data label for getting user
Afterwards, when sample data reaches data balancing, voice messaging is pre-processed according to default processing rule corresponding to obtain
Spectral vectors, then it is based on preset Recognition with Recurrent Neural Network, mould is carried out according to the corresponding spectral vectors of voice messaging and data label
Type training is to obtain emotion recognition model, wherein the Recognition with Recurrent Neural Network includes attention mechanism, and the attention mechanism is used for
Reinforce the partial region in the voice messaging.The emotion recognition model that this method trains have can generalization it is strong, identification
High accuracy for examination.Simultaneously as extreme mood is often more rare much than neutral mood, therefore sample unevenness
Problem and lead to overfitting problem, this method can solve sample problem of non-uniform very well, and then improve the accuracy of model.
Referring to Fig. 5, Fig. 5 is a kind of schematic flow diagram for emotion identification method that embodiments herein provides.The feelings
Feel recognition methods, can be applied in terminal or server, for the emotion according to the voice recognition user of user.
As shown in figure 5, the emotion identification method, including step S301 to step S303.
S301, the voice signal for acquiring user.
Specifically, corresponding voice signal, the sound pick-up outfit when chatting with user can be acquired by sound pick-up outfit such as to record
Sound pen, smart phone, tablet computer, notebook or intelligent wearable device etc., such as Intelligent bracelet or smartwatch etc..
The default processing rule of S302, basis pre-processes the voice signal corresponding to obtain the voice signal
Spectral vectors.
Specifically, the voice signal is pre-processed according to default processing rule corresponding to obtain the voice signal
Spectral vectors, comprising: framing windowing process is carried out to voice messaging with the voice messaging that obtains that treated;To treated language
Message breath carries out Fast Fourier Transform (FFT) to obtain amplitude spectrum;Meier filter group is increased to amplitude spectrum, and by Meier filter
The output of group does discrete cosine transform to obtain mel-frequency cepstrum coefficient;Obtained each mel-frequency cepstrum coefficient is carried out
Normalized is to obtain the corresponding spectral vectors of voice messaging.
S303, the spectral vectors are input to emotion recognition model the emotion of the user is identified, to obtain
The emotional category of the user.
Wherein, the emotion recognition model is using the emotion recognition model training method training provided in above-described embodiment
Obtained model.The spectral vectors of input are analyzed by the emotion recognition model, to accurately obtain the emotion of user,
Specially affective style, such as glad, sad or neutrality etc..
Emotion identification method provided by the above embodiment, by the voice signal for acquiring user;According to default processing rule
The voice signal is pre-processed to obtain the corresponding spectral vectors of the voice signal;The spectral vectors are input to
Emotion recognition model identifies the emotion of the user, to obtain the emotional category of the user.This method can be quick
The affective style of user is recognized, while having many advantages, such as that recognition accuracy is high again.
Referring to Fig. 6, Fig. 6 is a kind of schematic block diagram for model training apparatus that one embodiment of the application provides, the mould
Type training device can be configured in server, for executing the training method of emotion recognition model above-mentioned.
As shown in fig. 6, the model training apparatus 400, comprising: information acquisition unit 401, sample construction unit 402, data
Processing unit 403, network extraction unit 404 and model training unit 405.
Information acquisition unit 401, for obtaining the voice messaging and the corresponding data label of the voice messaging of user.
Sample construction unit 402, for constructing sample data according to the voice messaging and corresponding data label.
Data processing unit 403, it is pre- for being carried out according to default processing rule to the voice messaging in the sample data
Processing is to obtain corresponding spectral vectors.
In one embodiment, the data processing unit 403, comprising:
Information processing subelement 4031, for carrying out framing windowing process to the voice messaging in the sample data to obtain
To treated voice messaging;Information converts subelement 4032, for treated, voice messaging to carry out frequency-domain transform to obtain
To corresponding amplitude spectrum;Filtering transformation subelement 4033, for being filtered place to the amplitude spectrum by Meier filter group
Reason, and discrete cosine transform is carried out to obtain mel-frequency cepstrum coefficient to the amplitude spectrum after filtering processing;Normalize subelement
4034, for the mel-frequency cepstrum coefficient to be normalized with obtain the corresponding frequency spectrum of the voice messaging to
Amount.
In one embodiment, filtering transformation subelement 4033, is specifically used for: obtaining the corresponding maximum of the voice messaging
Frequency calculates the corresponding mel-frequency of the maximum frequency using mel-frequency calculation formula;According to the mel-frequency of calculating with
And between the Meier of the centre frequency of quantity two adjacent triangular filters of calculating of the Meier filter group intermediate cam filter
Away from;The linear distribution to multiple triangular filters is completed according to the Meier spacing;According to the multiple triangles for completing linear distribution
Filter is filtered the amplitude spectrum.
Network extraction unit 404, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention
Mechanism, the attention mechanism are used to reinforce the partial region in the voice messaging;
Model training unit 405, for be based on the Recognition with Recurrent Neural Network, according to the corresponding frequency spectrum of the voice messaging to
Amount and data label carry out model training to obtain emotion recognition model.
Referring to Fig. 7, Fig. 7 is the schematic block diagram for another model training apparatus that one embodiment of the application provides, it should
Model training apparatus can be configured in server, for executing the training method of emotion recognition model above-mentioned.
As shown in fig. 7, the model training apparatus 500, comprising: information acquisition unit 501, sample construction unit 502, balance
Judging unit 503, Balance Treatment unit 504, data processing unit 505, network extraction unit 506 and model training unit 507.
Information acquisition unit 501, for obtaining the voice messaging and the corresponding data label of the voice messaging of user.
Sample construction unit 502, for constructing sample data, institute according to the voice messaging and corresponding data label
Stating sample data includes positive sample data and negative sample data.
Judging unit 503 is balanced, for judging whether positive sample data in the sample data and negative sample data reach
To balance
Balance Treatment unit 504, if the positive sample data and negative sample data nonbalance are used for, at preset data
Reason rule is handled the sample data so that the positive sample data and negative sample data reach balance.
Data processing unit 505, if being used for the positive sample data and negative sample data balancing, according to default processing rule
Voice messaging in the sample data is pre-processed to obtain corresponding spectral vectors.
Network extraction unit 506, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention
Mechanism, the attention mechanism are used to reinforce the partial region in the voice messaging;
Model training unit 507, for be based on the Recognition with Recurrent Neural Network, according to the corresponding frequency spectrum of the voice messaging to
Amount and data label carry out model training to obtain emotion recognition model.
Referring to Fig. 8, Fig. 8 is a kind of schematic block diagram for emotion recognition device that one embodiment of the application provides, the feelings
Sense identification device can be configured in terminal or server, for executing emotion identification method above-mentioned.
As shown in figure 8, the emotion recognition device 600, comprising: signal acquisition unit 601, signal processing unit 602 and feelings
Feel recognition unit 603.
Signal acquisition unit 601, for acquiring the voice signal of user.
Signal processing unit 602, for being pre-processed the voice signal to obtain according to default processing rule
The corresponding spectral vectors of predicate sound signal.
Emotion recognition unit 603, for the spectral vectors to be input to emotion recognition model to the emotion of the user
It is identified, to obtain the emotional category of the user, the emotion recognition model is to use emotion described in any of the above embodiments
The model that the training of identification model training method obtains.
It should be noted that it is apparent to those skilled in the art that, for convenience of description and succinctly,
The device of foregoing description and the specific work process of each unit, can refer to corresponding processes in the foregoing method embodiment, herein
It repeats no more.
Above-mentioned device can be implemented as a kind of form of computer program, which can be as shown in Figure 9
Computer equipment on run.
Referring to Fig. 9, Fig. 9 is a kind of structural representation block diagram of computer equipment provided by the embodiments of the present application.The meter
Calculating machine equipment can be server or terminal.
Refering to Fig. 9, which includes processor, memory and the network interface connected by system bus,
In, memory may include non-volatile memory medium and built-in storage.
Non-volatile memory medium can storage program area and computer program.The computer program includes program instruction,
The program instruction is performed, and processor may make to execute training method or the emotion recognition side of any one emotion recognition model
Method.
Processor supports the operation of entire computer equipment for providing calculating and control ability.
Built-in storage provides environment for the operation of the computer program in non-volatile memory medium, the computer program quilt
When processor executes, processor may make to execute the training method or emotion identification method of any one emotion recognition model.
The network interface such as sends the task dispatching of distribution for carrying out network communication.It will be understood by those skilled in the art that
Structure shown in Fig. 9, only the block diagram of part-structure relevant to application scheme, is not constituted to application scheme institute
The restriction for the computer equipment being applied thereon, specific computer equipment may include than more or fewer portions as shown in the figure
Part perhaps combines certain components or with different component layouts.
It should be understood that processor can be central processing unit (Central Processing Unit, CPU), it should
Processor can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specially
With integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array
(Field-Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor are patrolled
Collect device, discrete hardware components etc..Wherein, general processor can be microprocessor or the processor be also possible to it is any often
The processor etc. of rule.
Wherein, in one embodiment, the processor is for running computer program stored in memory, with reality
Existing following steps:
Obtain the voice messaging and the corresponding data label of the voice messaging of user;According to the voice messaging and
Corresponding data label constructs sample data;The voice messaging in the sample data is located in advance according to default processing rule
Reason is to obtain corresponding spectral vectors;Preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism,
The attention mechanism is used to reinforce the partial region in the voice messaging;Based on the Recognition with Recurrent Neural Network, according to described
The corresponding spectral vectors of voice messaging and data label carry out model training to obtain emotion recognition model.
In one embodiment, the processor is realizing the default processing rule of the basis in the sample data
When voice messaging is pre-processed to obtain corresponding spectral vectors, for realizing:
Framing windowing process is carried out to the voice messaging in the sample data with the voice messaging that obtains that treated;To place
Voice messaging after reason carries out frequency-domain transform to obtain corresponding amplitude spectrum;The amplitude spectrum is carried out by Meier filter group
Filtering processing, and discrete cosine transform is carried out to obtain mel-frequency cepstrum coefficient to the amplitude spectrum after filtering processing;To described
Mel-frequency cepstrum coefficient is normalized to obtain the corresponding spectral vectors of the voice messaging.
In one embodiment, the processor described is filtered the amplitude spectrum by Meier filter group realizing
When wave processing, for realizing:
The corresponding maximum frequency of the voice messaging is obtained, calculates the maximum frequency pair using mel-frequency calculation formula
The mel-frequency answered;Two are calculated according to the quantity of the mel-frequency of calculating and the Meier filter group intermediate cam filter
The Meier spacing of the centre frequency of adjacent triangular filter;It is completed according to the Meier spacing to the linear of multiple triangular filters
Distribution;The amplitude spectrum is filtered according to the multiple triangular filters for completing linear distribution.
In one embodiment, the mel-frequency calculation formula are as follows:
Wherein, fmelFor the mel-frequency, f is the corresponding maximum frequency of the voice messaging, and A is coefficient.
In one embodiment, described place is normalized to the mel-frequency cepstrum coefficient realizing in the processor
When reason is to obtain the voice messaging corresponding spectral vectors, for realizing:
Use zero-mean normalization that the mel-frequency cepstrum coefficient is normalized to obtain the voice letter
Corresponding spectral vectors are ceased, the zero-mean normalizes corresponding conversion formula are as follows:
Wherein,For the mean value of mel-frequency cepstrum coefficient;σ is the standard deviation of mel-frequency cepstrum coefficient;X is each plum
That frequency cepstral coefficient;x*For the mel-frequency cepstrum coefficient after normalization.
In one embodiment, the structure of the Recognition with Recurrent Neural Network include input layer, it is circulation layer, attention mechanism, complete
Even layer and output layer;The attention mechanism is used for the output quantity and weight vectors of the circulation layer according to attention establishing equation
Between mapping relations to realize the partial region reinforced in the voice messaging;
The attention equation are as follows:
Wherein,f(hi)=tanh (Whi+b);G is the input vector for connecting layer entirely;hiFor
The output quantity of the corresponding circulation layer of each time point i;aiIt is the corresponding weight vectors of each time point i, it is every for representing
One time point i is to the influence size for connecting layer and output layer entirely;T is the total number of time point i;W is the matrix of a dimension S*D
Parameter, S are positive integer, and b and u are the vector parameter that a dimension is S, and D is the number of network unit in the circulation layer.
Wherein, in another embodiment, the processor is for running computer program stored in memory, with reality
Existing following steps:
Acquire the voice signal of user;
The voice signal is pre-processed according to default processing rule to obtain the corresponding frequency spectrum of the voice signal
Vector;
The spectral vectors are input to emotion recognition model to identify the emotion of the user, to obtain the use
The emotional category at family, the emotion recognition model are using the described in any item emotion recognition model training sides of preceding claim
The model that method training obtains.
A kind of computer readable storage medium is also provided in embodiments herein, the computer readable storage medium is deposited
Computer program is contained, includes program instruction in the computer program, the processor executes described program instruction, realizes this
Apply for the training method or emotion identification method of any one emotion recognition model that embodiment provides.
Wherein, the computer readable storage medium can be the storage inside of computer equipment described in previous embodiment
Unit, such as the hard disk or memory of the computer equipment.The computer readable storage medium is also possible to the computer
The plug-in type hard disk being equipped on the External memory equipment of equipment, such as the computer equipment, intelligent memory card (Smart
Media Card, SMC), secure digital (Secure Digita l, SD) card, flash card (Flash Card) etc..
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should all cover within the scope of protection of this application.Therefore, the protection scope of the application should be with right
It is required that protection scope subject to.
Claims (10)
1. a kind of training method of emotion recognition model characterized by comprising
Obtain the voice messaging and the corresponding data label of the voice messaging of user;
Sample data is constructed according to the voice messaging and corresponding data label;
The voice messaging in the sample data is pre-processed according to default processing rule to obtain corresponding spectral vectors;
Preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, and the attention mechanism is used for
Reinforce the partial region in the voice messaging;
Based on the Recognition with Recurrent Neural Network, model training is carried out according to the corresponding spectral vectors of the voice messaging and data label
To obtain emotion recognition model.
2. training method according to claim 1, which is characterized in that the default processing rule of the basis is to the sample number
Voice messaging in is pre-processed to obtain corresponding spectral vectors, comprising:
Framing windowing process is carried out to the voice messaging in the sample data with the voice messaging that obtains that treated;
To treated, voice messaging carries out frequency-domain transform to obtain corresponding amplitude spectrum;
The amplitude spectrum is filtered by Meier filter group, and the amplitude spectrum after filtering processing is carried out discrete remaining
String is converted to obtain mel-frequency cepstrum coefficient;
The mel-frequency cepstrum coefficient is normalized to obtain the corresponding spectral vectors of the voice messaging.
3. training method according to claim 2, which is characterized in that it is described by Meier filter group to the amplitude spectrum
It is filtered, comprising:
The corresponding maximum frequency of the voice messaging is obtained, it is corresponding to calculate the maximum frequency using mel-frequency calculation formula
Mel-frequency;
Two adjacent triangles are calculated according to the quantity of the mel-frequency of calculating and the Meier filter group intermediate cam filter
The Meier spacing of the centre frequency of filter;
The linear distribution to multiple triangular filters is completed according to the Meier spacing;
The amplitude spectrum is filtered according to the multiple triangular filters for completing linear distribution.
4. training method according to claim 3, which is characterized in that the mel-frequency calculation formula are as follows:
Wherein, fmelFor the mel-frequency, f is the corresponding maximum frequency of the voice messaging, and A is coefficient;
It is described the mel-frequency cepstrum coefficient to be normalized to obtain the corresponding spectral vectors of the voice messaging,
Include:
Use zero-mean normalization that the mel-frequency cepstrum coefficient is normalized to obtain the voice messaging pair
The spectral vectors answered, the zero-mean normalize corresponding conversion formula are as follows:
Wherein,For the mean value of mel-frequency cepstrum coefficient;σ is the standard deviation of mel-frequency cepstrum coefficient;X is each Meier frequency
Rate cepstrum coefficient;x*For the mel-frequency cepstrum coefficient after normalization.
5. training method according to claim 1, which is characterized in that the structure of the Recognition with Recurrent Neural Network includes input
Layer, attention mechanism, connects layer and output layer at circulation layer entirely;The attention mechanism is used for according to attention establishing equation
Mapping relations between the output quantity and weight vectors of circulation layer are to realize the partial region reinforced in the voice messaging;
The attention equation are as follows:
Wherein,f(hi)=tanh (Whi+b);G is the input vector for connecting layer entirely;hiIt is each
The output quantity of the corresponding circulation layer of a time point i;aiIt is the corresponding weight vectors of each time point i, for representing each
Time point i is to the influence size for connecting layer and output layer entirely;T is the total number of time point i;The matrix ginseng that W is a dimension S*D
Number, S are positive integer, and b and u are the vector parameter that a dimension is S, and D is the number of network unit in the circulation layer.
6. a kind of emotion identification method characterized by comprising
Acquire the voice signal of user;
The voice signal is pre-processed according to default processing rule to obtain the corresponding spectral vectors of the voice signal;
The spectral vectors are input to emotion recognition model to identify the emotion of the user, to obtain the user's
Emotional category, the emotion recognition model are using emotion recognition model training method described in any one of claims 1 to 5
The model that training obtains.
7. a kind of training device of emotion recognition model characterized by comprising
Information acquisition unit, for obtaining the voice messaging and the corresponding data label of the voice messaging of user;
Sample construction unit, for constructing sample data according to the voice messaging and corresponding data label;
Data processing unit, for being pre-processed the voice messaging in the sample data to obtain according to default processing rule
To corresponding spectral vectors;
Network extraction unit, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention mechanism, institute
Attention mechanism is stated for reinforcing the partial region in the voice messaging;
Model training unit, for being based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors sum number of the voice messaging
Model training is carried out according to label to obtain emotion recognition model.
8. a kind of emotion recognition device characterized by comprising
Signal acquisition unit, for acquiring the voice signal of user;
Signal processing unit, for being pre-processed to the voice signal according to default processing rule to obtain the voice letter
Number corresponding spectral vectors;
Emotion recognition unit knows the emotion of the user for the spectral vectors to be input to emotion recognition model
Not, to obtain the emotional category of the user, the emotion recognition model is using described in any one of claims 1 to 5
The model that the training of emotion recognition model training method obtains.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor;
The memory is for storing computer program;
The processor, for executing the computer program and realization such as claim 1 when executing the computer program
The emotion identification method into the training method of emotion recognition model described in any one of 5, or such as claim 6.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program make the processor realize the feelings as described in any one of claims 1 to 5 when being executed by processor
Feel the training method of identification model, or such as the emotion identification method in claim 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910145605.2A CN109817246B (en) | 2019-02-27 | 2019-02-27 | Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium |
PCT/CN2019/117711 WO2020173133A1 (en) | 2019-02-27 | 2019-11-12 | Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910145605.2A CN109817246B (en) | 2019-02-27 | 2019-02-27 | Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109817246A true CN109817246A (en) | 2019-05-28 |
CN109817246B CN109817246B (en) | 2023-04-18 |
Family
ID=66607622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910145605.2A Active CN109817246B (en) | 2019-02-27 | 2019-02-27 | Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109817246B (en) |
WO (1) | WO2020173133A1 (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110211563A (en) * | 2019-06-19 | 2019-09-06 | 平安科技(深圳)有限公司 | Chinese speech synthesis method, apparatus and storage medium towards scene and emotion |
CN110223714A (en) * | 2019-06-03 | 2019-09-10 | 杭州哲信信息技术有限公司 | A kind of voice-based Emotion identification method |
CN110288980A (en) * | 2019-06-17 | 2019-09-27 | 平安科技(深圳)有限公司 | Audio recognition method, the training method of model, device, equipment and storage medium |
CN110400579A (en) * | 2019-06-25 | 2019-11-01 | 华东理工大学 | Based on direction from the speech emotion recognition of attention mechanism and two-way length network in short-term |
CN110532380A (en) * | 2019-07-12 | 2019-12-03 | 杭州电子科技大学 | A kind of text sentiment classification method based on memory network |
CN110890088A (en) * | 2019-10-12 | 2020-03-17 | 中国平安财产保险股份有限公司 | Voice information feedback method and device, computer equipment and storage medium |
CN111179945A (en) * | 2019-12-31 | 2020-05-19 | 中国银行股份有限公司 | Voiceprint recognition-based safety door control method and device |
CN111276119A (en) * | 2020-01-17 | 2020-06-12 | 平安科技(深圳)有限公司 | Voice generation method and system and computer equipment |
CN111341351A (en) * | 2020-02-25 | 2020-06-26 | 厦门亿联网络技术股份有限公司 | Voice activity detection method and device based on self-attention mechanism and storage medium |
CN111357051A (en) * | 2019-12-24 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111429948A (en) * | 2020-03-27 | 2020-07-17 | 南京工业大学 | Voice emotion recognition model and method based on attention convolution neural network |
CN111582382A (en) * | 2020-05-09 | 2020-08-25 | Oppo广东移动通信有限公司 | State recognition method and device and electronic equipment |
WO2020173133A1 (en) * | 2019-02-27 | 2020-09-03 | 平安科技(深圳)有限公司 | Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium |
CN111816205A (en) * | 2020-07-09 | 2020-10-23 | 中国人民解放军战略支援部队航天工程大学 | Airplane audio-based intelligent airplane type identification method |
CN111832317A (en) * | 2020-07-09 | 2020-10-27 | 平安普惠企业管理有限公司 | Intelligent information diversion method and device, computer equipment and readable storage medium |
CN111985231A (en) * | 2020-08-07 | 2020-11-24 | 中移(杭州)信息技术有限公司 | Unsupervised role recognition method and device, electronic equipment and storage medium |
CN112163571A (en) * | 2020-10-29 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for identifying attribute of electronic equipment user |
CN112331182A (en) * | 2020-10-26 | 2021-02-05 | 平安科技(深圳)有限公司 | Voice data generation method and device, computer equipment and storage medium |
CN112466324A (en) * | 2020-11-13 | 2021-03-09 | 上海听见信息科技有限公司 | Emotion analysis method, system, equipment and readable storage medium |
CN112992177A (en) * | 2021-02-20 | 2021-06-18 | 平安科技(深圳)有限公司 | Training method, device, equipment and storage medium of voice style migration model |
CN113053361A (en) * | 2021-03-18 | 2021-06-29 | 北京金山云网络技术有限公司 | Speech recognition method, model training method, device, equipment and medium |
CN113270111A (en) * | 2021-05-17 | 2021-08-17 | 广州国音智能科技有限公司 | Height prediction method, device, equipment and medium based on audio data |
CN113327631A (en) * | 2021-07-15 | 2021-08-31 | 广州虎牙科技有限公司 | Emotion recognition model training method, emotion recognition method and emotion recognition device |
CN113421594A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Speech emotion recognition method, device, equipment and storage medium |
CN113889150A (en) * | 2021-10-15 | 2022-01-04 | 北京工业大学 | Speech emotion recognition method and device |
CN113889149A (en) * | 2021-10-15 | 2022-01-04 | 北京工业大学 | Speech emotion recognition method and device |
CN113935336A (en) * | 2021-10-09 | 2022-01-14 | 上海淇玥信息技术有限公司 | Method and device for determining conversational strategy for voice conversation and electronic equipment |
WO2022198923A1 (en) * | 2021-03-26 | 2022-09-29 | 之江实验室 | Speech emotion recognition method and system using fusion of crowd information |
CN116916497B (en) * | 2023-09-12 | 2023-12-26 | 深圳市卡能光电科技有限公司 | Nested situation identification-based illumination control method and system for floor cylindrical atmosphere lamp |
CN117648717A (en) * | 2024-01-29 | 2024-03-05 | 知学云(北京)科技股份有限公司 | Privacy protection method for artificial intelligent voice training |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112185423B (en) * | 2020-09-28 | 2023-11-21 | 南京工程学院 | Voice emotion recognition method based on multi-head attention mechanism |
CN112257658B (en) * | 2020-11-11 | 2023-10-10 | 微医云(杭州)控股有限公司 | Electroencephalogram signal processing method and device, electronic equipment and storage medium |
CN112733994B (en) * | 2020-12-10 | 2024-07-12 | 中国科学院深圳先进技术研究院 | Autonomous emotion generation method, system and application of robot |
CN112786017B (en) * | 2020-12-25 | 2024-04-09 | 北京猿力未来科技有限公司 | Training method and device of speech speed detection model, and speech speed detection method and device |
CN112948554B (en) * | 2021-02-28 | 2024-03-08 | 西北工业大学 | Real-time multi-mode dialogue emotion analysis method based on reinforcement learning and domain knowledge |
CN113178197B (en) * | 2021-04-27 | 2024-01-09 | 平安科技(深圳)有限公司 | Training method and device of voice verification model and computer equipment |
CN113343860A (en) * | 2021-06-10 | 2021-09-03 | 南京工业大学 | Bimodal fusion emotion recognition method based on video image and voice |
CN113420556B (en) * | 2021-07-23 | 2023-06-20 | 平安科技(深圳)有限公司 | Emotion recognition method, device, equipment and storage medium based on multi-mode signals |
CN113592001B (en) * | 2021-08-03 | 2024-02-02 | 西北工业大学 | Multi-mode emotion recognition method based on deep canonical correlation analysis |
CN113919387A (en) * | 2021-08-18 | 2022-01-11 | 东北林业大学 | Electroencephalogram signal emotion recognition based on GBDT-LR model |
CN113837299B (en) * | 2021-09-28 | 2023-09-01 | 平安科技(深圳)有限公司 | Network training method and device based on artificial intelligence and electronic equipment |
CN114548262B (en) * | 2022-02-21 | 2024-03-22 | 华中科技大学鄂州工业技术研究院 | Feature level fusion method for multi-mode physiological signals in emotion calculation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106340309A (en) * | 2016-08-23 | 2017-01-18 | 南京大空翼信息技术有限公司 | Dog bark emotion recognition method and device based on deep learning |
US20170018270A1 (en) * | 2015-07-16 | 2017-01-19 | Samsung Electronics Co., Ltd. | Speech recognition apparatus and method |
CN108550375A (en) * | 2018-03-14 | 2018-09-18 | 鲁东大学 | A kind of emotion identification method, device and computer equipment based on voice signal |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109285562A (en) * | 2018-09-28 | 2019-01-29 | 东南大学 | Speech-emotion recognition method based on attention mechanism |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766894B (en) * | 2017-11-03 | 2021-01-22 | 吉林大学 | Remote sensing image natural language generation method based on attention mechanism and deep learning |
CN108922515A (en) * | 2018-05-31 | 2018-11-30 | 平安科技(深圳)有限公司 | Speech model training method, audio recognition method, device, equipment and medium |
CN109062937B (en) * | 2018-06-15 | 2019-11-26 | 北京百度网讯科技有限公司 | The method of training description text generation model, the method and device for generating description text |
CN109817246B (en) * | 2019-02-27 | 2023-04-18 | 平安科技(深圳)有限公司 | Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium |
-
2019
- 2019-02-27 CN CN201910145605.2A patent/CN109817246B/en active Active
- 2019-11-12 WO PCT/CN2019/117711 patent/WO2020173133A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170018270A1 (en) * | 2015-07-16 | 2017-01-19 | Samsung Electronics Co., Ltd. | Speech recognition apparatus and method |
CN106340309A (en) * | 2016-08-23 | 2017-01-18 | 南京大空翼信息技术有限公司 | Dog bark emotion recognition method and device based on deep learning |
CN108550375A (en) * | 2018-03-14 | 2018-09-18 | 鲁东大学 | A kind of emotion identification method, device and computer equipment based on voice signal |
CN109285562A (en) * | 2018-09-28 | 2019-01-29 | 东南大学 | Speech-emotion recognition method based on attention mechanism |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020173133A1 (en) * | 2019-02-27 | 2020-09-03 | 平安科技(深圳)有限公司 | Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium |
CN110223714A (en) * | 2019-06-03 | 2019-09-10 | 杭州哲信信息技术有限公司 | A kind of voice-based Emotion identification method |
CN110288980A (en) * | 2019-06-17 | 2019-09-27 | 平安科技(深圳)有限公司 | Audio recognition method, the training method of model, device, equipment and storage medium |
CN110211563A (en) * | 2019-06-19 | 2019-09-06 | 平安科技(深圳)有限公司 | Chinese speech synthesis method, apparatus and storage medium towards scene and emotion |
CN110211563B (en) * | 2019-06-19 | 2024-05-24 | 平安科技(深圳)有限公司 | Chinese speech synthesis method, device and storage medium for scenes and emotion |
CN110400579A (en) * | 2019-06-25 | 2019-11-01 | 华东理工大学 | Based on direction from the speech emotion recognition of attention mechanism and two-way length network in short-term |
CN110532380A (en) * | 2019-07-12 | 2019-12-03 | 杭州电子科技大学 | A kind of text sentiment classification method based on memory network |
CN110890088A (en) * | 2019-10-12 | 2020-03-17 | 中国平安财产保险股份有限公司 | Voice information feedback method and device, computer equipment and storage medium |
CN110890088B (en) * | 2019-10-12 | 2022-07-15 | 中国平安财产保险股份有限公司 | Voice information feedback method and device, computer equipment and storage medium |
CN111357051A (en) * | 2019-12-24 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111357051B (en) * | 2019-12-24 | 2024-02-02 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
WO2021127982A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, smart device, and computer-readable storage medium |
CN111179945A (en) * | 2019-12-31 | 2020-05-19 | 中国银行股份有限公司 | Voiceprint recognition-based safety door control method and device |
CN111276119A (en) * | 2020-01-17 | 2020-06-12 | 平安科技(深圳)有限公司 | Voice generation method and system and computer equipment |
CN111276119B (en) * | 2020-01-17 | 2023-08-22 | 平安科技(深圳)有限公司 | Speech generation method, system and computer equipment |
CN111341351A (en) * | 2020-02-25 | 2020-06-26 | 厦门亿联网络技术股份有限公司 | Voice activity detection method and device based on self-attention mechanism and storage medium |
CN111429948A (en) * | 2020-03-27 | 2020-07-17 | 南京工业大学 | Voice emotion recognition model and method based on attention convolution neural network |
CN111582382A (en) * | 2020-05-09 | 2020-08-25 | Oppo广东移动通信有限公司 | State recognition method and device and electronic equipment |
CN111582382B (en) * | 2020-05-09 | 2023-10-31 | Oppo广东移动通信有限公司 | State identification method and device and electronic equipment |
CN111832317B (en) * | 2020-07-09 | 2023-08-18 | 广州市炎华网络科技有限公司 | Intelligent information flow guiding method and device, computer equipment and readable storage medium |
CN111816205B (en) * | 2020-07-09 | 2023-06-20 | 中国人民解放军战略支援部队航天工程大学 | Airplane audio-based intelligent recognition method for airplane models |
CN111816205A (en) * | 2020-07-09 | 2020-10-23 | 中国人民解放军战略支援部队航天工程大学 | Airplane audio-based intelligent airplane type identification method |
CN111832317A (en) * | 2020-07-09 | 2020-10-27 | 平安普惠企业管理有限公司 | Intelligent information diversion method and device, computer equipment and readable storage medium |
CN111985231B (en) * | 2020-08-07 | 2023-12-26 | 中移(杭州)信息技术有限公司 | Unsupervised role recognition method and device, electronic equipment and storage medium |
CN111985231A (en) * | 2020-08-07 | 2020-11-24 | 中移(杭州)信息技术有限公司 | Unsupervised role recognition method and device, electronic equipment and storage medium |
WO2021189980A1 (en) * | 2020-10-26 | 2021-09-30 | 平安科技(深圳)有限公司 | Voice data generation method and apparatus, and computer device and storage medium |
CN112331182A (en) * | 2020-10-26 | 2021-02-05 | 平安科技(深圳)有限公司 | Voice data generation method and device, computer equipment and storage medium |
CN112163571B (en) * | 2020-10-29 | 2024-03-05 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for identifying attribute of electronic equipment user |
CN112163571A (en) * | 2020-10-29 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for identifying attribute of electronic equipment user |
CN112466324A (en) * | 2020-11-13 | 2021-03-09 | 上海听见信息科技有限公司 | Emotion analysis method, system, equipment and readable storage medium |
CN112992177B (en) * | 2021-02-20 | 2023-10-17 | 平安科技(深圳)有限公司 | Training method, device, equipment and storage medium of voice style migration model |
CN112992177A (en) * | 2021-02-20 | 2021-06-18 | 平安科技(深圳)有限公司 | Training method, device, equipment and storage medium of voice style migration model |
CN113053361A (en) * | 2021-03-18 | 2021-06-29 | 北京金山云网络技术有限公司 | Speech recognition method, model training method, device, equipment and medium |
WO2022198923A1 (en) * | 2021-03-26 | 2022-09-29 | 之江实验室 | Speech emotion recognition method and system using fusion of crowd information |
CN113270111A (en) * | 2021-05-17 | 2021-08-17 | 广州国音智能科技有限公司 | Height prediction method, device, equipment and medium based on audio data |
CN113421594A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Speech emotion recognition method, device, equipment and storage medium |
CN113421594B (en) * | 2021-06-30 | 2023-09-22 | 平安科技(深圳)有限公司 | Speech emotion recognition method, device, equipment and storage medium |
CN113327631A (en) * | 2021-07-15 | 2021-08-31 | 广州虎牙科技有限公司 | Emotion recognition model training method, emotion recognition method and emotion recognition device |
CN113327631B (en) * | 2021-07-15 | 2023-03-21 | 广州虎牙科技有限公司 | Emotion recognition model training method, emotion recognition method and emotion recognition device |
CN113935336A (en) * | 2021-10-09 | 2022-01-14 | 上海淇玥信息技术有限公司 | Method and device for determining conversational strategy for voice conversation and electronic equipment |
CN113889150A (en) * | 2021-10-15 | 2022-01-04 | 北京工业大学 | Speech emotion recognition method and device |
CN113889150B (en) * | 2021-10-15 | 2023-08-29 | 北京工业大学 | Speech emotion recognition method and device |
CN113889149B (en) * | 2021-10-15 | 2023-08-29 | 北京工业大学 | Speech emotion recognition method and device |
CN113889149A (en) * | 2021-10-15 | 2022-01-04 | 北京工业大学 | Speech emotion recognition method and device |
CN116916497B (en) * | 2023-09-12 | 2023-12-26 | 深圳市卡能光电科技有限公司 | Nested situation identification-based illumination control method and system for floor cylindrical atmosphere lamp |
CN117648717A (en) * | 2024-01-29 | 2024-03-05 | 知学云(北京)科技股份有限公司 | Privacy protection method for artificial intelligent voice training |
CN117648717B (en) * | 2024-01-29 | 2024-05-03 | 知学云(北京)科技股份有限公司 | Privacy protection method for artificial intelligent voice training |
Also Published As
Publication number | Publication date |
---|---|
WO2020173133A1 (en) | 2020-09-03 |
CN109817246B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109817246A (en) | Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model | |
Yadav et al. | Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN) | |
CN108597492B (en) | Phoneme synthesizing method and device | |
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
CN109859772B (en) | Emotion recognition method, emotion recognition device and computer-readable storage medium | |
CN111312245B (en) | Voice response method, device and storage medium | |
CN108197115A (en) | Intelligent interactive method, device, computer equipment and computer readable storage medium | |
CN112259106A (en) | Voiceprint recognition method and device, storage medium and computer equipment | |
CN108899049A (en) | A kind of speech-emotion recognition method and system based on convolutional neural networks | |
WO2021047319A1 (en) | Voice-based personal credit assessment method and apparatus, terminal and storage medium | |
CN112216307B (en) | Speech emotion recognition method and device | |
CN109313892A (en) | Steady language identification method and system | |
WO2022178969A1 (en) | Voice conversation data processing method and apparatus, and computer device and storage medium | |
Al-Dujaili et al. | Speech emotion recognition: a comprehensive survey | |
Sethu et al. | Speech based emotion recognition | |
Caponetti et al. | Biologically inspired emotion recognition from speech | |
CN114127849A (en) | Speech emotion recognition method and device | |
Ali et al. | DWT features performance analysis for automatic speech recognition of Urdu | |
Yang et al. | Algorithm for speech emotion recognition classification based on mel-frequency cepstral coefficients and broad learning system | |
Akinpelu et al. | Lightweight deep learning framework for speech emotion recognition | |
CN114913859B (en) | Voiceprint recognition method, voiceprint recognition device, electronic equipment and storage medium | |
Johar | Paralinguistic profiling using speech recognition | |
CN116959464A (en) | Training method of audio generation network, audio generation method and device | |
CN116416962A (en) | Audio synthesis method, device, equipment and storage medium | |
Fonnegra et al. | Speech emotion recognition based on a recurrent neural network classification model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |