CN109817246A - Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model - Google Patents

Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model Download PDF

Info

Publication number
CN109817246A
CN109817246A CN201910145605.2A CN201910145605A CN109817246A CN 109817246 A CN109817246 A CN 109817246A CN 201910145605 A CN201910145605 A CN 201910145605A CN 109817246 A CN109817246 A CN 109817246A
Authority
CN
China
Prior art keywords
voice messaging
emotion
frequency
mel
emotion recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910145605.2A
Other languages
Chinese (zh)
Other versions
CN109817246B (en
Inventor
刘博卿
贾雪丽
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910145605.2A priority Critical patent/CN109817246B/en
Publication of CN109817246A publication Critical patent/CN109817246A/en
Priority to PCT/CN2019/117711 priority patent/WO2020173133A1/en
Application granted granted Critical
Publication of CN109817246B publication Critical patent/CN109817246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

This application involves intelligent decision fields, based on deep learning training emotion recognition model.Training method, emotion identification method, device, computer equipment and the storage medium of a kind of emotion recognition model are specifically disclosed, this method comprises: obtaining the voice messaging and corresponding data label of user;Sample data is constructed according to voice messaging and corresponding data label;The voice messaging in sample data is pre-processed according to default processing rule to obtain corresponding spectral vectors;Preset Recognition with Recurrent Neural Network is extracted, Recognition with Recurrent Neural Network includes attention mechanism, and attention mechanism is used to reinforce the partial region in voice messaging;Based on Recognition with Recurrent Neural Network, model training is carried out to obtain emotion recognition model according to the corresponding spectral vectors of voice messaging and data label.This method can be improved emotion recognition model can generalization, improve model identification accuracy rate.

Description

Training method, emotion identification method, device, equipment and the storage of emotion recognition model Medium
Technical field
This application involves model training technical fields more particularly to a kind of training method of emotion recognition model, emotion to know Other method, apparatus, computer equipment and storage medium.
Background technique
In recent years, extensive hair has been obtained using the emotion recognition model of voice recognition user emotion based on machine learning Exhibition, but many challenges have been also faced with for the emotion recognition of sound, such as in order to generate the knowledge of lasting accurate positive negative affect Not, in such a way that text and acoustic feature combine, this mode needs to utilize speech recognition part identification model Sound is converted text information by (Automatic Speech Recognition, ASR) technology, serious but there are retardances The problem of.Meanwhile there is also the problems of generalization difference for emotion recognition model, it is quasi- when model is applied to new speaker True rate can reduce.
Summary of the invention
This application provides a kind of training method of emotion recognition model, emotion identification method, device, computer equipment and Storage medium, with improve emotion recognition model can generalization, improve the accuracy rate of identification.
In a first aspect, this application provides a kind of training methods of emotion recognition model, which comprises
Obtain the voice messaging and the corresponding data label of the voice messaging of user;
Sample data is constructed according to the voice messaging and corresponding data label;
The voice messaging in the sample data is pre-processed according to default processing rule to obtain corresponding frequency spectrum Vector;
Preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, the attention mechanism For reinforcing the partial region in the voice messaging;
Based on the Recognition with Recurrent Neural Network, model is carried out according to the corresponding spectral vectors of the voice messaging and data label Training is to obtain emotion recognition model.
Second aspect, present invention also provides a kind of emotion identification methods, which comprises
Acquire the voice signal of user;
The voice signal is pre-processed according to default processing rule to obtain the corresponding frequency spectrum of the voice signal Vector;
The spectral vectors are input to emotion recognition model to identify the emotion of the user, to obtain the use The emotional category at family, the emotion recognition model are the model obtained using above-mentioned emotion recognition model training method training.
The third aspect, present invention also provides a kind of training device of emotion recognition model, described device includes:
Acquiring unit, for obtaining the voice messaging and the corresponding data label of the voice messaging of user;
Sample construction unit, for constructing sample data according to the voice messaging and corresponding data label;
Pretreatment unit, for according to default processing rule to the voice messaging in the sample data pre-processed with Obtain corresponding spectral vectors;
Extraction unit, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention mechanism, institute Attention mechanism is stated for reinforcing the partial region in the voice messaging;
Model training unit, for being based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of the voice messaging Model training is carried out with data label to obtain emotion recognition model.
The third aspect, present invention also provides a kind of emotion recognition device, described device includes:
Signal acquisition unit, for acquiring the voice signal of user;
Signal processing unit, for being pre-processed the predicate to obtain to the voice signal according to default processing rule The corresponding spectral vectors of sound signal;
Emotion recognition unit carries out the emotion of the user for the spectral vectors to be input to emotion recognition model Identification, to obtain the emotional category of the user, the emotion recognition model is using above-mentioned emotion recognition model training side The model that method training obtains.
Fourth aspect, present invention also provides a kind of computer equipment, the computer equipment includes memory and processing Device;The memory is for storing computer program;The processor, for executing the computer program and described in the execution Training method or the emotion identification method such as above-mentioned emotion recognition model are realized when computer program.
5th aspect, present invention also provides a kind of computer readable storage medium, the computer readable storage medium It is stored with computer program, the computer program makes the processor realize such as above-mentioned emotion recognition when being executed by processor The training method of model or the emotion identification method.
This application discloses training method, device, equipment and the storage medium of a kind of emotion recognition model, this method is being obtained After getting the voice messaging and corresponding data label of user, according to default processing rule to voice messaging pre-processed with Corresponding spectral vectors are obtained, then are based on preset Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of voice messaging and data Label carries out model training to obtain emotion recognition model, wherein the Recognition with Recurrent Neural Network includes attention mechanism, the attention Power mechanism is used to reinforce the partial region in the voice messaging.The emotion recognition model that this method trains has can generalization By force, the high accuracy for examination of identification.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of schematic flow diagram of the training method for emotion recognition model that embodiments herein provides;
Fig. 2 is the structural schematic diagram for the Recognition with Recurrent Neural Network that embodiments herein provides;
Fig. 3 is the sub-step schematic flow diagram of the training method of the emotion recognition model in Fig. 1;
Fig. 4 is a kind of schematic flow diagram of the training method for emotion recognition model that embodiments herein provides;
Fig. 5 is a kind of schematic flow diagram for emotion identification method that embodiments herein provides;
Fig. 6 is a kind of schematic block diagram of model training apparatus provided by the embodiments of the present application;
Fig. 7 is the schematic block diagram of another model training apparatus provided by the embodiments of the present application;
Fig. 8 is a kind of schematic block diagram of emotion recognition device provided by the embodiments of the present application;
Fig. 9 is a kind of structural representation block diagram for computer equipment that one embodiment of the application provides.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.
Flow chart shown in the drawings only illustrates, it is not necessary to including all content and operation/step, also not It is that must be executed by described sequence.For example, some operation/steps can also decompose, combine or partially merge, therefore practical The sequence of execution is possible to change according to the actual situation.
Embodiments herein provides a kind of training method of emotion recognition model, emotion identification method, device, calculating Machine equipment and storage medium.Wherein, server can be used to be trained for the training method of emotion recognition model;Emotion identification method Can be applied in terminal or server, go out the affective style of the user for the voice recognition according to user, for example, it is glad or Sadness etc..
Wherein, server can be independent server, or server cluster.The terminal can be mobile phone, put down The electronic equipments such as plate computer, laptop, desktop computer, personal digital assistant and wearable device.
With reference to the accompanying drawing, it elaborates to some embodiments of the application.In the absence of conflict, following Feature in embodiment and embodiment can be combined with each other.
Referring to Fig. 1, Fig. 1 is a kind of signal stream of the training method for emotion recognition model that embodiments herein provides Cheng Tu.Wherein, which is to carry out model training based on preset Recognition with Recurrent Neural Network to obtain.
As shown in Fig. 2, Fig. 2 is a kind of structural representation for preset Recognition with Recurrent Neural Network that embodiments herein provides Figure.The structure of the Recognition with Recurrent Neural Network includes input layer, circulation layer, attention mechanism, connects layer and output layer entirely;The attention Power mechanism is for the mapping relations between the output quantity and weight vectors of the circulation layer according to attention establishing equation to realize Reinforce the partial region in the voice messaging, and then improves the recognition accuracy of model.
Wherein, circulation layer includes shot and long term memory network (Long Short-Term Memory, LSTM) unit, output layer It is exported using Softmax.In the structure of Recognition with Recurrent Neural Network, temporal dependence in the corresponding list entries of input layer It is to be modeled with one includes the circulation layer of shot and long term memory network unit;Attention mechanism is to be applied in the sequence often It is that some regions in sequence increase more weights, these regions are to know in the output of one time point corresponding circulation layer Not positive negative-morality when important region.Relative to other Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNN for), which can be used to learn prolonged dependence, while gradient disappears not yet Or the problem of gradient explosion, available better recognition effect.
Below with reference to the structure of the Recognition with Recurrent Neural Network in Fig. 2, the emotion recognition of embodiments herein offer is introduced The training method of model.
As shown in Figure 1, the training method of the emotion recognition model, for training emotion recognition model with quickly and accurately Identify the affective style of user.Wherein the training method includes step S101 to step S105.
S101, the voice messaging and the corresponding data label of the voice messaging for obtaining user.
Wherein, data label is the affective tag of user, including positive mood label, neutral mood label and negative-morality label Deng.It is of course also possible to voice messaging is divided into more classes, and then corresponding more data labels, for example, it is glad, sad, fear, The data labels such as sad or neutral, different data label represent the different moods of user.
Specifically, the voice messaging of user is obtained from presetting database, which includes label data, i.e., The corresponding data label of the voice messaging.Before this, further includes: acquire the voice messaging of user and according to data label pair The voice messaging is marked, and the voice messaging for being marked with data label is stored in the presetting database.With Family may include the user etc. of the crowds such as user in different crowd, such as child, youth, middle age and old age;It is understood that It is also possible to crowd of different occupation, such as teacher, student, doctor, lawyer and IT personnel etc., and then abundant sample data is more Sample.
In one embodiment, in order to improve the recognition accuracy of model, voice messaging is set and is acquired, is i.e. institute State the voice messaging and the corresponding data label of the voice messaging for obtaining user, comprising: obtain user and tell about different emotions Corresponding voice messaging and the user carry out the data mark that emotion marking generates to the voice messaging when story of type Label.
Specifically, acquisition user tells about two passive stories and the corresponding voice letter of two optimistic stories first Breath;And before saying every story or after telling a story, obtains the user and give a mark according to scoring criterion to its mood Corresponding marking score;Scoring criterion such as makes 0-5 scores of expression negative-morality, and 6-10 points are positive moods, and raw according to marking score At corresponding data label;For example marking is 4 points, then the corresponding label data of the voice messaging is negative-morality label.
It is of course also possible to which the user of acquisition is told about two passive stories and the corresponding voice letter of two optimistic stories Breath carries out segmentation marking, and determines corresponding data label according to the corresponding marking score of segmentation marking, for example, by voice messaging It is divided into two sections of sound bites, the marking score of first segment sound bite is 0 point, then corresponding data label is negative-morality, second The marking score of section sound bite is 10 points, then corresponding data label is positive mood.
S102, sample data is constructed according to the voice messaging and corresponding data label.
Specifically, sample data can be constituted according to the voice messaging of acquisition user and corresponding data label.User For multiple users, particular number does not limit herein, and since the emotion of user is different, which includes positive sample data With negative sample data, positive sample data correspond to the voice messaging of positive mood, and positive mood is such as optimistic, glad and excitement etc.;It is negative Sample data corresponds to the voice messaging of negative-morality, and negative-morality is such as the relevant mood such as passive, sad and pain.
The default processing rule of S103, basis pre-processes the voice messaging in the sample data corresponding to obtain Spectral vectors.
Wherein, the voice messaging in the sample data is for producing the information in frequency domain by the default processing rule, Specifically such as the voice messaging acquired in the time domain is converted into using Fast Fourier Transform (FFT) rule or wavelet transformation rule Information in frequency domain.
In one embodiment, in order to accelerate model training and identification precision, using preprocessing rule, such as Fig. 3 institute Show, i.e. step S103 includes: sub-step S103a to sub-step S103d.
S103a, framing windowing process is carried out to the voice messaging in the sample data, and to obtain that treated, voice is believed Breath.
Wherein, it is 40ms that frame length, which is specifically arranged, in framing windowing process, is carried out according to the frame length 40ms of setting to voice messaging Then dividing processing is added hamming window to handle voice messaging after segmentation, is added at hamming window again with the voice messaging after being divided Reason refers to voice messaging after segmentation multiplied by a window function, in order to carry out Fourier expansion.
It should be noted that framing windowing process, the specific frame length that is arranged can be set to other values, for example, be set as 50ms, 30ms or other values.
In one embodiment, framing windowing process is being carried out to be handled to the voice messaging in the sample data Before voice messaging afterwards, can also preemphasis processing be carried out to voice messaging, specifically multiplied by one and the frequency of voice messaging The predetermined coefficient being positively correlated, to promote the amplitude of high frequency, the size of the predetermined coefficient and the parameter of model training are associated, i.e., Changed according to the variation of model parameter, for example, with weight vectors aiIt is associated, with specific reference to weight vectors aiCorresponding mean value increases Increase greatly, or is reduced according to the mean value and reduced.Purpose is preferably to improve the accuracy of identification of model.
In an alternative embodiment, predetermined coefficient can be set to an empirical value, and an empirical value, which is arranged, to be used In eliminating effect caused by vocal cords and lip in user's voiced process, to compensate the height that voice messaging is constrained by articulatory system Frequency part, and the formant of high frequency can be highlighted.
S103b, to treated, voice messaging carries out frequency-domain transform to obtain corresponding amplitude spectrum.
Specifically to treated voice messaging carry out Fast Fourier Transform (FFT) (Fast Fourier Transform, It FFT), is that amplitude is as amplitude spectrum in order to obtain in the present embodiment, i.e., after Fast Fourier Transform (FFT) to obtain corresponding parameter Amplitude.It is of course also possible to the other parameters after FFT transform, for example amplitude is plus phase information etc..
It is understood that wavelet transformation can also be carried out to treated voice messaging to obtain corresponding parameter, and Select transformed amplitude as amplitude spectrum.
S103c, the amplitude spectrum is filtered by Meier filter group, and to the amplitude spectrum after filtering processing Discrete cosine transform is carried out to obtain mel-frequency cepstrum coefficient.
Specifically, described that the amplitude spectrum is filtered by Meier filter group, comprising: to obtain the voice The corresponding maximum frequency of information calculates the corresponding mel-frequency of the maximum frequency using mel-frequency calculation formula;According to meter The quantity of the mel-frequency of calculation and the Meier filter group intermediate cam filter calculates in two adjacent triangular filters The Meier spacing of frequency of heart;The linear distribution to multiple triangular filters is completed according to the Meier spacing;It is linear according to completing Multiple triangular filters of distribution are filtered the amplitude spectrum.
Meier filter group specifically includes the triangular filter that 40 linear distributions are measured in Meier.Amplitude spectrum will be obtained to lead to 40 linear distributions are crossed after the triangular filter that Meier is measured is filtered, then carries out discrete cosine transform and obtains Meier Frequency cepstral coefficient.
It determines corresponding maximum frequency in voice messaging, can be calculated most according to maximum frequency using mel-frequency calculation formula Big mel-frequency calculates two adjacent triangular filters according to quantity (40) of maximum mel-frequency and triangular filter The spacing of centre frequency;The linear distribution to multiple triangular filters is completed according to the spacing calculated.
Wherein, the mel-frequency calculation formula are as follows:
In formula (1), fmelFor the mel-frequency, f is the corresponding maximum frequency of the voice messaging, and A is coefficient, Specially 2595.
For example, the maximum frequency determined is 4000Hz, can find out maximum mel-frequency using formula (1) is 2146.1mel。
Since in Meier measure range, the centre frequency of each triangular filter is the linear distribution of equal intervals.By This, can calculate the spacing of the centre frequency of two adjacent triangular filters are as follows:
Wherein, Δ mel is the spacing of the centre frequency of two adjacent triangular filters;K is the quantity of triangular filter.
S103d, the mel-frequency cepstrum coefficient is normalized to obtain the corresponding frequency of the voice messaging Compose vector.
Specifically, use zero-mean normalization that the mel-frequency cepstrum coefficient is normalized described to obtain The corresponding spectral vectors of voice messaging, the zero-mean normalize corresponding conversion formula are as follows:
Wherein,For the mean value of mel-frequency cepstrum coefficient;σ is the standard deviation of mel-frequency cepstrum coefficient;X is each plum That frequency cepstral coefficient;x*For the mel-frequency cepstrum coefficient after normalization.
Zero-the mean normalization (Z-Score standardization) used, also referred to as standard deviation standardize.Treated data Mean value be 0, mark difference be 1.Z-Score standardization is uniformly to convert different magnitude of data to the same magnitude, unified It is measured with calculated Z-Score value, to guarantee the comparativity between data.
S104, preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, the attention Mechanism is used to reinforce the partial region in the voice messaging.
Wherein, the structure of the Recognition with Recurrent Neural Network includes input layer, circulation layer, attention mechanism, connects layer and output entirely Layer;The attention mechanism is for the mapping between the output quantity and weight vectors of the circulation layer according to attention establishing equation Relationship is to realize the partial region reinforced in the voice messaging.
The attention equation are as follows:
Wherein, g is the input vector for connecting layer entirely;hiFor the output quantity of the corresponding circulation layer of each time point i;ai It is the corresponding weight vectors of each time point i, it is big to the influence for connecting layer and output layer entirely for representing each time point i It is small.
The key of attention mechanism is study to this equation, and the equation gives each circulation layer in each time point i Output hiWith a weight vectors aiBetween establish a mapping relations, hiIndicate the output of circulation layer, aiIt is for representing Influence size of each time point to the layer after in network.
Wherein, f (hi) in parameter in the training process can be optimised, expression formula specifically:
f(hi)=tanh (Whi+b) (4)
In formula (4), W and b are the parameter of linear equation, hiCorresponding is the output of the LSTM layer of each time point i, It is expressed as hi=(h0,...hT-1), wherein T is the total number at time point in the sequence given for one.It is simple in the present embodiment The form for its expression formula changed, the specific activation primitive for adding a tanh using a linear function such as formula (4) both can be with Preferable effect is obtained, while the training speed of model can be improved again.
The time point i, weight vectors a given for oneiFormula are as follows:
In formula (5), W is the matrix parameter of a dimension S*D, and S is positive integer, b and u be a dimension be S to Parameter is measured, D is the number of network unit in the circulation layer.
It should be noted that g is input of the vector as full articulamentum, activation primitive uses ReLu function, later Connecting layer uses Softmax function entirely, to obtain output to the end.
S105, be based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of the voice messaging and data label into Row model training is to obtain emotion recognition model.
Specifically, spectral vectors are input to preset Recognition with Recurrent Neural Network and carry out model training, pass through improved model In attention mechanism the major part in sound is reinforced, optimize corresponding model parameter and then obtain emotion recognition mould Type, model training parameter are as shown in table 1.
Table 1 is the relevant parameter of training network
Parameter type Parameter value
Optimization algorithm Adam
Learning rate 0.0005
LSTM unit number 128
Full articulamentum neuron number 20
The probability that Dropout retains 0.7
Model training method provided by the above embodiment is in the voice messaging and corresponding data label for getting user Afterwards, voice messaging is pre-processed according to default processing rule to obtain corresponding spectral vectors, then is based on preset circulation Neural network, according to the corresponding spectral vectors of voice messaging and data label progress model training to obtain emotion recognition model, Wherein, which includes attention mechanism, and the attention mechanism is used to reinforce the part in the voice messaging Region.The emotion recognition model that this method trains have can generalization it is strong, the high accuracy for examination of identification.
Referring to Fig. 4, Fig. 4 is the signal of the training method for another emotion recognition model that embodiments herein provides Flow chart.Wherein, which is to carry out model training based on preset Recognition with Recurrent Neural Network to obtain, and certainly may be used To be trained to obtain using other networks.
As shown in figure 4, the training method of the emotion recognition model, including step S201 to step S207.
S201, the voice messaging and the corresponding data label of the voice messaging for obtaining user.
Wherein, data label is the affective tag of user, including positive mood label, neutral mood label and negative-morality label Deng.It is of course also possible to voice messaging is divided into more classes, and then corresponding more data labels, for example, it is glad, sad, fear, The data labels such as sad or neutral, different data label represent the different moods of user.
S202, sample data is constructed according to the voice messaging and corresponding data label, the sample data is at least Including positive sample data and negative sample data.
Specifically, sample data can be constituted according to the voice messaging of acquisition user and corresponding data label.Due to The emotion of user is different, therefore the sample data includes at least positive sample data and negative sample data, for example may also include neutrality Sample data.Positive sample data correspond to the voice messaging of positive mood;Negative sample data correspond to the voice messaging of negative-morality.
S203, judge whether positive sample data and negative sample data in the sample data reach balance.
Specifically, whether the positive sample data judged in the sample data and negative sample data reach balance, and Judging result is generated, which includes: positive sample data and negative sample data balancing and positive sample data and negative sample number According to imbalance.
Wherein, if positive sample data and negative sample data nonbalance, then follow the steps S204;If positive sample data and negative sample Notebook data balance, thens follow the steps S205.
S204, the sample data is handled according to default data processing rule so that positive sample data and negative Sample data reaches balance.
If the positive sample data and negative sample data nonbalance, according to default data processing rule to the sample data It is handled so that the positive sample data and negative sample data reach balance.Specifically, sample can be corresponded to by two ways Data are handled so that positive sample data and negative sample data reach balance.It is respectively as follows:
One, sample data is handled by way of over-sampling: positive sample data in the sample data of building and Negative sample data, usually negative sample data are less than positive sample data, specifically by the negative sample data duplication repeatedly and with just The sample data of sample data composing training.For the sample data that training is used for, due to negative sample number therein According to replicating several times, constitute new sample data, and then can solve the problem of sample unevenness more.
Two, sample data is handled by the way that Weighted Loss Function is arranged: the intersection entropy function by making a standard Or the Model Weight θ of the cross entropy function minimization training of weighting is optimal, especially by the thought of weighting, such as negative sample It is few, know it is negative sample when training, goes to be adjusted model parameter by weight, to increase the influence of negative sample. Wherein, the corresponding expression formula of cross entropy loss function of standard are as follows:
Wherein,It is the output of the Softmax of each sequence n observed, it is F*D that wherein X, which is dimension, Matrix, wherein F represent is each time point input spectral coefficient quantity;CnIt is the sequence that each is observed The label of the corresponding class of n, the value range of label are { 0,1 }, naturally it is also possible to it be { 0,1,2 }, respectively corresponds negative sample, it is neutral Sample and positive sample.It is of course also possible to use the intersection entropy function of weighting, the cross entropy of the intersection entropy function and standard of the weighting Loss function is similar, and target is all to solve the problems, such as that sample data is non-uniform.
The default processing rule of S205, basis pre-processes the voice messaging in the sample data corresponding to obtain Spectral vectors.
Specifically, if the positive sample data and negative sample data reach balance, according to default processing rule to described Voice messaging in sample data is pre-processed to obtain corresponding spectral vectors.Wherein, the default processing rule for for Voice messaging in the sample data is produced into the information in frequency domain, specifically such as using Fast Fourier Transform (FFT) rule or The voice messaging acquired in the time domain is converted into the information in frequency domain by wavelet transformation rule.
S206, preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, the attention Mechanism is used to reinforce the partial region in the voice messaging.
Wherein, the structure of the Recognition with Recurrent Neural Network includes input layer, circulation layer, attention mechanism, connects layer and output entirely Layer;The attention mechanism is for the mapping between the output quantity and weight vectors of the circulation layer according to attention establishing equation Relationship is to realize the partial region reinforced in the voice messaging.
S207, be based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors of the voice messaging and data label into Row model training is to obtain emotion recognition model.
Specifically, spectral vectors are input to preset Recognition with Recurrent Neural Network and carry out model training, pass through improved model In attention mechanism the major part in sound is reinforced, optimize corresponding model parameter and then obtain emotion recognition mould Type.
Model training method provided by the above embodiment is in the voice messaging and corresponding data label for getting user Afterwards, when sample data reaches data balancing, voice messaging is pre-processed according to default processing rule corresponding to obtain Spectral vectors, then it is based on preset Recognition with Recurrent Neural Network, mould is carried out according to the corresponding spectral vectors of voice messaging and data label Type training is to obtain emotion recognition model, wherein the Recognition with Recurrent Neural Network includes attention mechanism, and the attention mechanism is used for Reinforce the partial region in the voice messaging.The emotion recognition model that this method trains have can generalization it is strong, identification High accuracy for examination.Simultaneously as extreme mood is often more rare much than neutral mood, therefore sample unevenness Problem and lead to overfitting problem, this method can solve sample problem of non-uniform very well, and then improve the accuracy of model.
Referring to Fig. 5, Fig. 5 is a kind of schematic flow diagram for emotion identification method that embodiments herein provides.The feelings Feel recognition methods, can be applied in terminal or server, for the emotion according to the voice recognition user of user.
As shown in figure 5, the emotion identification method, including step S301 to step S303.
S301, the voice signal for acquiring user.
Specifically, corresponding voice signal, the sound pick-up outfit when chatting with user can be acquired by sound pick-up outfit such as to record Sound pen, smart phone, tablet computer, notebook or intelligent wearable device etc., such as Intelligent bracelet or smartwatch etc..
The default processing rule of S302, basis pre-processes the voice signal corresponding to obtain the voice signal Spectral vectors.
Specifically, the voice signal is pre-processed according to default processing rule corresponding to obtain the voice signal Spectral vectors, comprising: framing windowing process is carried out to voice messaging with the voice messaging that obtains that treated;To treated language Message breath carries out Fast Fourier Transform (FFT) to obtain amplitude spectrum;Meier filter group is increased to amplitude spectrum, and by Meier filter The output of group does discrete cosine transform to obtain mel-frequency cepstrum coefficient;Obtained each mel-frequency cepstrum coefficient is carried out Normalized is to obtain the corresponding spectral vectors of voice messaging.
S303, the spectral vectors are input to emotion recognition model the emotion of the user is identified, to obtain The emotional category of the user.
Wherein, the emotion recognition model is using the emotion recognition model training method training provided in above-described embodiment Obtained model.The spectral vectors of input are analyzed by the emotion recognition model, to accurately obtain the emotion of user, Specially affective style, such as glad, sad or neutrality etc..
Emotion identification method provided by the above embodiment, by the voice signal for acquiring user;According to default processing rule The voice signal is pre-processed to obtain the corresponding spectral vectors of the voice signal;The spectral vectors are input to Emotion recognition model identifies the emotion of the user, to obtain the emotional category of the user.This method can be quick The affective style of user is recognized, while having many advantages, such as that recognition accuracy is high again.
Referring to Fig. 6, Fig. 6 is a kind of schematic block diagram for model training apparatus that one embodiment of the application provides, the mould Type training device can be configured in server, for executing the training method of emotion recognition model above-mentioned.
As shown in fig. 6, the model training apparatus 400, comprising: information acquisition unit 401, sample construction unit 402, data Processing unit 403, network extraction unit 404 and model training unit 405.
Information acquisition unit 401, for obtaining the voice messaging and the corresponding data label of the voice messaging of user.
Sample construction unit 402, for constructing sample data according to the voice messaging and corresponding data label.
Data processing unit 403, it is pre- for being carried out according to default processing rule to the voice messaging in the sample data Processing is to obtain corresponding spectral vectors.
In one embodiment, the data processing unit 403, comprising:
Information processing subelement 4031, for carrying out framing windowing process to the voice messaging in the sample data to obtain To treated voice messaging;Information converts subelement 4032, for treated, voice messaging to carry out frequency-domain transform to obtain To corresponding amplitude spectrum;Filtering transformation subelement 4033, for being filtered place to the amplitude spectrum by Meier filter group Reason, and discrete cosine transform is carried out to obtain mel-frequency cepstrum coefficient to the amplitude spectrum after filtering processing;Normalize subelement 4034, for the mel-frequency cepstrum coefficient to be normalized with obtain the corresponding frequency spectrum of the voice messaging to Amount.
In one embodiment, filtering transformation subelement 4033, is specifically used for: obtaining the corresponding maximum of the voice messaging Frequency calculates the corresponding mel-frequency of the maximum frequency using mel-frequency calculation formula;According to the mel-frequency of calculating with And between the Meier of the centre frequency of quantity two adjacent triangular filters of calculating of the Meier filter group intermediate cam filter Away from;The linear distribution to multiple triangular filters is completed according to the Meier spacing;According to the multiple triangles for completing linear distribution Filter is filtered the amplitude spectrum.
Network extraction unit 404, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention Mechanism, the attention mechanism are used to reinforce the partial region in the voice messaging;
Model training unit 405, for be based on the Recognition with Recurrent Neural Network, according to the corresponding frequency spectrum of the voice messaging to Amount and data label carry out model training to obtain emotion recognition model.
Referring to Fig. 7, Fig. 7 is the schematic block diagram for another model training apparatus that one embodiment of the application provides, it should Model training apparatus can be configured in server, for executing the training method of emotion recognition model above-mentioned.
As shown in fig. 7, the model training apparatus 500, comprising: information acquisition unit 501, sample construction unit 502, balance Judging unit 503, Balance Treatment unit 504, data processing unit 505, network extraction unit 506 and model training unit 507.
Information acquisition unit 501, for obtaining the voice messaging and the corresponding data label of the voice messaging of user.
Sample construction unit 502, for constructing sample data, institute according to the voice messaging and corresponding data label Stating sample data includes positive sample data and negative sample data.
Judging unit 503 is balanced, for judging whether positive sample data in the sample data and negative sample data reach To balance
Balance Treatment unit 504, if the positive sample data and negative sample data nonbalance are used for, at preset data Reason rule is handled the sample data so that the positive sample data and negative sample data reach balance.
Data processing unit 505, if being used for the positive sample data and negative sample data balancing, according to default processing rule Voice messaging in the sample data is pre-processed to obtain corresponding spectral vectors.
Network extraction unit 506, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention Mechanism, the attention mechanism are used to reinforce the partial region in the voice messaging;
Model training unit 507, for be based on the Recognition with Recurrent Neural Network, according to the corresponding frequency spectrum of the voice messaging to Amount and data label carry out model training to obtain emotion recognition model.
Referring to Fig. 8, Fig. 8 is a kind of schematic block diagram for emotion recognition device that one embodiment of the application provides, the feelings Sense identification device can be configured in terminal or server, for executing emotion identification method above-mentioned.
As shown in figure 8, the emotion recognition device 600, comprising: signal acquisition unit 601, signal processing unit 602 and feelings Feel recognition unit 603.
Signal acquisition unit 601, for acquiring the voice signal of user.
Signal processing unit 602, for being pre-processed the voice signal to obtain according to default processing rule The corresponding spectral vectors of predicate sound signal.
Emotion recognition unit 603, for the spectral vectors to be input to emotion recognition model to the emotion of the user It is identified, to obtain the emotional category of the user, the emotion recognition model is to use emotion described in any of the above embodiments The model that the training of identification model training method obtains.
It should be noted that it is apparent to those skilled in the art that, for convenience of description and succinctly, The device of foregoing description and the specific work process of each unit, can refer to corresponding processes in the foregoing method embodiment, herein It repeats no more.
Above-mentioned device can be implemented as a kind of form of computer program, which can be as shown in Figure 9 Computer equipment on run.
Referring to Fig. 9, Fig. 9 is a kind of structural representation block diagram of computer equipment provided by the embodiments of the present application.The meter Calculating machine equipment can be server or terminal.
Refering to Fig. 9, which includes processor, memory and the network interface connected by system bus, In, memory may include non-volatile memory medium and built-in storage.
Non-volatile memory medium can storage program area and computer program.The computer program includes program instruction, The program instruction is performed, and processor may make to execute training method or the emotion recognition side of any one emotion recognition model Method.
Processor supports the operation of entire computer equipment for providing calculating and control ability.
Built-in storage provides environment for the operation of the computer program in non-volatile memory medium, the computer program quilt When processor executes, processor may make to execute the training method or emotion identification method of any one emotion recognition model.
The network interface such as sends the task dispatching of distribution for carrying out network communication.It will be understood by those skilled in the art that Structure shown in Fig. 9, only the block diagram of part-structure relevant to application scheme, is not constituted to application scheme institute The restriction for the computer equipment being applied thereon, specific computer equipment may include than more or fewer portions as shown in the figure Part perhaps combines certain components or with different component layouts.
It should be understood that processor can be central processing unit (Central Processing Unit, CPU), it should Processor can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specially With integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor are patrolled Collect device, discrete hardware components etc..Wherein, general processor can be microprocessor or the processor be also possible to it is any often The processor etc. of rule.
Wherein, in one embodiment, the processor is for running computer program stored in memory, with reality Existing following steps:
Obtain the voice messaging and the corresponding data label of the voice messaging of user;According to the voice messaging and Corresponding data label constructs sample data;The voice messaging in the sample data is located in advance according to default processing rule Reason is to obtain corresponding spectral vectors;Preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, The attention mechanism is used to reinforce the partial region in the voice messaging;Based on the Recognition with Recurrent Neural Network, according to described The corresponding spectral vectors of voice messaging and data label carry out model training to obtain emotion recognition model.
In one embodiment, the processor is realizing the default processing rule of the basis in the sample data When voice messaging is pre-processed to obtain corresponding spectral vectors, for realizing:
Framing windowing process is carried out to the voice messaging in the sample data with the voice messaging that obtains that treated;To place Voice messaging after reason carries out frequency-domain transform to obtain corresponding amplitude spectrum;The amplitude spectrum is carried out by Meier filter group Filtering processing, and discrete cosine transform is carried out to obtain mel-frequency cepstrum coefficient to the amplitude spectrum after filtering processing;To described Mel-frequency cepstrum coefficient is normalized to obtain the corresponding spectral vectors of the voice messaging.
In one embodiment, the processor described is filtered the amplitude spectrum by Meier filter group realizing When wave processing, for realizing:
The corresponding maximum frequency of the voice messaging is obtained, calculates the maximum frequency pair using mel-frequency calculation formula The mel-frequency answered;Two are calculated according to the quantity of the mel-frequency of calculating and the Meier filter group intermediate cam filter The Meier spacing of the centre frequency of adjacent triangular filter;It is completed according to the Meier spacing to the linear of multiple triangular filters Distribution;The amplitude spectrum is filtered according to the multiple triangular filters for completing linear distribution.
In one embodiment, the mel-frequency calculation formula are as follows:
Wherein, fmelFor the mel-frequency, f is the corresponding maximum frequency of the voice messaging, and A is coefficient.
In one embodiment, described place is normalized to the mel-frequency cepstrum coefficient realizing in the processor When reason is to obtain the voice messaging corresponding spectral vectors, for realizing:
Use zero-mean normalization that the mel-frequency cepstrum coefficient is normalized to obtain the voice letter Corresponding spectral vectors are ceased, the zero-mean normalizes corresponding conversion formula are as follows:
Wherein,For the mean value of mel-frequency cepstrum coefficient;σ is the standard deviation of mel-frequency cepstrum coefficient;X is each plum That frequency cepstral coefficient;x*For the mel-frequency cepstrum coefficient after normalization.
In one embodiment, the structure of the Recognition with Recurrent Neural Network include input layer, it is circulation layer, attention mechanism, complete Even layer and output layer;The attention mechanism is used for the output quantity and weight vectors of the circulation layer according to attention establishing equation Between mapping relations to realize the partial region reinforced in the voice messaging;
The attention equation are as follows:
Wherein,f(hi)=tanh (Whi+b);G is the input vector for connecting layer entirely;hiFor The output quantity of the corresponding circulation layer of each time point i;aiIt is the corresponding weight vectors of each time point i, it is every for representing One time point i is to the influence size for connecting layer and output layer entirely;T is the total number of time point i;W is the matrix of a dimension S*D Parameter, S are positive integer, and b and u are the vector parameter that a dimension is S, and D is the number of network unit in the circulation layer.
Wherein, in another embodiment, the processor is for running computer program stored in memory, with reality Existing following steps:
Acquire the voice signal of user;
The voice signal is pre-processed according to default processing rule to obtain the corresponding frequency spectrum of the voice signal Vector;
The spectral vectors are input to emotion recognition model to identify the emotion of the user, to obtain the use The emotional category at family, the emotion recognition model are using the described in any item emotion recognition model training sides of preceding claim The model that method training obtains.
A kind of computer readable storage medium is also provided in embodiments herein, the computer readable storage medium is deposited Computer program is contained, includes program instruction in the computer program, the processor executes described program instruction, realizes this Apply for the training method or emotion identification method of any one emotion recognition model that embodiment provides.
Wherein, the computer readable storage medium can be the storage inside of computer equipment described in previous embodiment Unit, such as the hard disk or memory of the computer equipment.The computer readable storage medium is also possible to the computer The plug-in type hard disk being equipped on the External memory equipment of equipment, such as the computer equipment, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digita l, SD) card, flash card (Flash Card) etc..
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should all cover within the scope of protection of this application.Therefore, the protection scope of the application should be with right It is required that protection scope subject to.

Claims (10)

1. a kind of training method of emotion recognition model characterized by comprising
Obtain the voice messaging and the corresponding data label of the voice messaging of user;
Sample data is constructed according to the voice messaging and corresponding data label;
The voice messaging in the sample data is pre-processed according to default processing rule to obtain corresponding spectral vectors;
Preset Recognition with Recurrent Neural Network is extracted, the Recognition with Recurrent Neural Network includes attention mechanism, and the attention mechanism is used for Reinforce the partial region in the voice messaging;
Based on the Recognition with Recurrent Neural Network, model training is carried out according to the corresponding spectral vectors of the voice messaging and data label To obtain emotion recognition model.
2. training method according to claim 1, which is characterized in that the default processing rule of the basis is to the sample number Voice messaging in is pre-processed to obtain corresponding spectral vectors, comprising:
Framing windowing process is carried out to the voice messaging in the sample data with the voice messaging that obtains that treated;
To treated, voice messaging carries out frequency-domain transform to obtain corresponding amplitude spectrum;
The amplitude spectrum is filtered by Meier filter group, and the amplitude spectrum after filtering processing is carried out discrete remaining String is converted to obtain mel-frequency cepstrum coefficient;
The mel-frequency cepstrum coefficient is normalized to obtain the corresponding spectral vectors of the voice messaging.
3. training method according to claim 2, which is characterized in that it is described by Meier filter group to the amplitude spectrum It is filtered, comprising:
The corresponding maximum frequency of the voice messaging is obtained, it is corresponding to calculate the maximum frequency using mel-frequency calculation formula Mel-frequency;
Two adjacent triangles are calculated according to the quantity of the mel-frequency of calculating and the Meier filter group intermediate cam filter The Meier spacing of the centre frequency of filter;
The linear distribution to multiple triangular filters is completed according to the Meier spacing;
The amplitude spectrum is filtered according to the multiple triangular filters for completing linear distribution.
4. training method according to claim 3, which is characterized in that the mel-frequency calculation formula are as follows:
Wherein, fmelFor the mel-frequency, f is the corresponding maximum frequency of the voice messaging, and A is coefficient;
It is described the mel-frequency cepstrum coefficient to be normalized to obtain the corresponding spectral vectors of the voice messaging, Include:
Use zero-mean normalization that the mel-frequency cepstrum coefficient is normalized to obtain the voice messaging pair The spectral vectors answered, the zero-mean normalize corresponding conversion formula are as follows:
Wherein,For the mean value of mel-frequency cepstrum coefficient;σ is the standard deviation of mel-frequency cepstrum coefficient;X is each Meier frequency Rate cepstrum coefficient;x*For the mel-frequency cepstrum coefficient after normalization.
5. training method according to claim 1, which is characterized in that the structure of the Recognition with Recurrent Neural Network includes input Layer, attention mechanism, connects layer and output layer at circulation layer entirely;The attention mechanism is used for according to attention establishing equation Mapping relations between the output quantity and weight vectors of circulation layer are to realize the partial region reinforced in the voice messaging;
The attention equation are as follows:
Wherein,f(hi)=tanh (Whi+b);G is the input vector for connecting layer entirely;hiIt is each The output quantity of the corresponding circulation layer of a time point i;aiIt is the corresponding weight vectors of each time point i, for representing each Time point i is to the influence size for connecting layer and output layer entirely;T is the total number of time point i;The matrix ginseng that W is a dimension S*D Number, S are positive integer, and b and u are the vector parameter that a dimension is S, and D is the number of network unit in the circulation layer.
6. a kind of emotion identification method characterized by comprising
Acquire the voice signal of user;
The voice signal is pre-processed according to default processing rule to obtain the corresponding spectral vectors of the voice signal;
The spectral vectors are input to emotion recognition model to identify the emotion of the user, to obtain the user's Emotional category, the emotion recognition model are using emotion recognition model training method described in any one of claims 1 to 5 The model that training obtains.
7. a kind of training device of emotion recognition model characterized by comprising
Information acquisition unit, for obtaining the voice messaging and the corresponding data label of the voice messaging of user;
Sample construction unit, for constructing sample data according to the voice messaging and corresponding data label;
Data processing unit, for being pre-processed the voice messaging in the sample data to obtain according to default processing rule To corresponding spectral vectors;
Network extraction unit, for extracting preset Recognition with Recurrent Neural Network, the Recognition with Recurrent Neural Network includes attention mechanism, institute Attention mechanism is stated for reinforcing the partial region in the voice messaging;
Model training unit, for being based on the Recognition with Recurrent Neural Network, according to the corresponding spectral vectors sum number of the voice messaging Model training is carried out according to label to obtain emotion recognition model.
8. a kind of emotion recognition device characterized by comprising
Signal acquisition unit, for acquiring the voice signal of user;
Signal processing unit, for being pre-processed to the voice signal according to default processing rule to obtain the voice letter Number corresponding spectral vectors;
Emotion recognition unit knows the emotion of the user for the spectral vectors to be input to emotion recognition model Not, to obtain the emotional category of the user, the emotion recognition model is using described in any one of claims 1 to 5 The model that the training of emotion recognition model training method obtains.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor;
The memory is for storing computer program;
The processor, for executing the computer program and realization such as claim 1 when executing the computer program The emotion identification method into the training method of emotion recognition model described in any one of 5, or such as claim 6.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program make the processor realize the feelings as described in any one of claims 1 to 5 when being executed by processor Feel the training method of identification model, or such as the emotion identification method in claim 6.
CN201910145605.2A 2019-02-27 2019-02-27 Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium Active CN109817246B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910145605.2A CN109817246B (en) 2019-02-27 2019-02-27 Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium
PCT/CN2019/117711 WO2020173133A1 (en) 2019-02-27 2019-11-12 Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910145605.2A CN109817246B (en) 2019-02-27 2019-02-27 Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109817246A true CN109817246A (en) 2019-05-28
CN109817246B CN109817246B (en) 2023-04-18

Family

ID=66607622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910145605.2A Active CN109817246B (en) 2019-02-27 2019-02-27 Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium

Country Status (2)

Country Link
CN (1) CN109817246B (en)
WO (1) WO2020173133A1 (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211563A (en) * 2019-06-19 2019-09-06 平安科技(深圳)有限公司 Chinese speech synthesis method, apparatus and storage medium towards scene and emotion
CN110223714A (en) * 2019-06-03 2019-09-10 杭州哲信信息技术有限公司 A kind of voice-based Emotion identification method
CN110288980A (en) * 2019-06-17 2019-09-27 平安科技(深圳)有限公司 Audio recognition method, the training method of model, device, equipment and storage medium
CN110400579A (en) * 2019-06-25 2019-11-01 华东理工大学 Based on direction from the speech emotion recognition of attention mechanism and two-way length network in short-term
CN110532380A (en) * 2019-07-12 2019-12-03 杭州电子科技大学 A kind of text sentiment classification method based on memory network
CN110890088A (en) * 2019-10-12 2020-03-17 中国平安财产保险股份有限公司 Voice information feedback method and device, computer equipment and storage medium
CN111179945A (en) * 2019-12-31 2020-05-19 中国银行股份有限公司 Voiceprint recognition-based safety door control method and device
CN111276119A (en) * 2020-01-17 2020-06-12 平安科技(深圳)有限公司 Voice generation method and system and computer equipment
CN111341351A (en) * 2020-02-25 2020-06-26 厦门亿联网络技术股份有限公司 Voice activity detection method and device based on self-attention mechanism and storage medium
CN111357051A (en) * 2019-12-24 2020-06-30 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111429948A (en) * 2020-03-27 2020-07-17 南京工业大学 Voice emotion recognition model and method based on attention convolution neural network
CN111582382A (en) * 2020-05-09 2020-08-25 Oppo广东移动通信有限公司 State recognition method and device and electronic equipment
WO2020173133A1 (en) * 2019-02-27 2020-09-03 平安科技(深圳)有限公司 Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium
CN111816205A (en) * 2020-07-09 2020-10-23 中国人民解放军战略支援部队航天工程大学 Airplane audio-based intelligent airplane type identification method
CN111832317A (en) * 2020-07-09 2020-10-27 平安普惠企业管理有限公司 Intelligent information diversion method and device, computer equipment and readable storage medium
CN111985231A (en) * 2020-08-07 2020-11-24 中移(杭州)信息技术有限公司 Unsupervised role recognition method and device, electronic equipment and storage medium
CN112163571A (en) * 2020-10-29 2021-01-01 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for identifying attribute of electronic equipment user
CN112331182A (en) * 2020-10-26 2021-02-05 平安科技(深圳)有限公司 Voice data generation method and device, computer equipment and storage medium
CN112466324A (en) * 2020-11-13 2021-03-09 上海听见信息科技有限公司 Emotion analysis method, system, equipment and readable storage medium
CN112992177A (en) * 2021-02-20 2021-06-18 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of voice style migration model
CN113053361A (en) * 2021-03-18 2021-06-29 北京金山云网络技术有限公司 Speech recognition method, model training method, device, equipment and medium
CN113270111A (en) * 2021-05-17 2021-08-17 广州国音智能科技有限公司 Height prediction method, device, equipment and medium based on audio data
CN113327631A (en) * 2021-07-15 2021-08-31 广州虎牙科技有限公司 Emotion recognition model training method, emotion recognition method and emotion recognition device
CN113421594A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Speech emotion recognition method, device, equipment and storage medium
CN113889150A (en) * 2021-10-15 2022-01-04 北京工业大学 Speech emotion recognition method and device
CN113889149A (en) * 2021-10-15 2022-01-04 北京工业大学 Speech emotion recognition method and device
CN113935336A (en) * 2021-10-09 2022-01-14 上海淇玥信息技术有限公司 Method and device for determining conversational strategy for voice conversation and electronic equipment
WO2022198923A1 (en) * 2021-03-26 2022-09-29 之江实验室 Speech emotion recognition method and system using fusion of crowd information
CN116916497B (en) * 2023-09-12 2023-12-26 深圳市卡能光电科技有限公司 Nested situation identification-based illumination control method and system for floor cylindrical atmosphere lamp
CN117648717A (en) * 2024-01-29 2024-03-05 知学云(北京)科技股份有限公司 Privacy protection method for artificial intelligent voice training

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112185423B (en) * 2020-09-28 2023-11-21 南京工程学院 Voice emotion recognition method based on multi-head attention mechanism
CN112257658B (en) * 2020-11-11 2023-10-10 微医云(杭州)控股有限公司 Electroencephalogram signal processing method and device, electronic equipment and storage medium
CN112733994B (en) * 2020-12-10 2024-07-12 中国科学院深圳先进技术研究院 Autonomous emotion generation method, system and application of robot
CN112786017B (en) * 2020-12-25 2024-04-09 北京猿力未来科技有限公司 Training method and device of speech speed detection model, and speech speed detection method and device
CN112948554B (en) * 2021-02-28 2024-03-08 西北工业大学 Real-time multi-mode dialogue emotion analysis method based on reinforcement learning and domain knowledge
CN113178197B (en) * 2021-04-27 2024-01-09 平安科技(深圳)有限公司 Training method and device of voice verification model and computer equipment
CN113343860A (en) * 2021-06-10 2021-09-03 南京工业大学 Bimodal fusion emotion recognition method based on video image and voice
CN113420556B (en) * 2021-07-23 2023-06-20 平安科技(深圳)有限公司 Emotion recognition method, device, equipment and storage medium based on multi-mode signals
CN113592001B (en) * 2021-08-03 2024-02-02 西北工业大学 Multi-mode emotion recognition method based on deep canonical correlation analysis
CN113919387A (en) * 2021-08-18 2022-01-11 东北林业大学 Electroencephalogram signal emotion recognition based on GBDT-LR model
CN113837299B (en) * 2021-09-28 2023-09-01 平安科技(深圳)有限公司 Network training method and device based on artificial intelligence and electronic equipment
CN114548262B (en) * 2022-02-21 2024-03-22 华中科技大学鄂州工业技术研究院 Feature level fusion method for multi-mode physiological signals in emotion calculation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340309A (en) * 2016-08-23 2017-01-18 南京大空翼信息技术有限公司 Dog bark emotion recognition method and device based on deep learning
US20170018270A1 (en) * 2015-07-16 2017-01-19 Samsung Electronics Co., Ltd. Speech recognition apparatus and method
CN108550375A (en) * 2018-03-14 2018-09-18 鲁东大学 A kind of emotion identification method, device and computer equipment based on voice signal
CN109243493A (en) * 2018-10-30 2019-01-18 南京工程学院 Based on the vagitus emotion identification method for improving long memory network in short-term
CN109285562A (en) * 2018-09-28 2019-01-29 东南大学 Speech-emotion recognition method based on attention mechanism

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766894B (en) * 2017-11-03 2021-01-22 吉林大学 Remote sensing image natural language generation method based on attention mechanism and deep learning
CN108922515A (en) * 2018-05-31 2018-11-30 平安科技(深圳)有限公司 Speech model training method, audio recognition method, device, equipment and medium
CN109062937B (en) * 2018-06-15 2019-11-26 北京百度网讯科技有限公司 The method of training description text generation model, the method and device for generating description text
CN109817246B (en) * 2019-02-27 2023-04-18 平安科技(深圳)有限公司 Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170018270A1 (en) * 2015-07-16 2017-01-19 Samsung Electronics Co., Ltd. Speech recognition apparatus and method
CN106340309A (en) * 2016-08-23 2017-01-18 南京大空翼信息技术有限公司 Dog bark emotion recognition method and device based on deep learning
CN108550375A (en) * 2018-03-14 2018-09-18 鲁东大学 A kind of emotion identification method, device and computer equipment based on voice signal
CN109285562A (en) * 2018-09-28 2019-01-29 东南大学 Speech-emotion recognition method based on attention mechanism
CN109243493A (en) * 2018-10-30 2019-01-18 南京工程学院 Based on the vagitus emotion identification method for improving long memory network in short-term

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020173133A1 (en) * 2019-02-27 2020-09-03 平安科技(深圳)有限公司 Training method of emotion recognition model, emotion recognition method, device, apparatus, and storage medium
CN110223714A (en) * 2019-06-03 2019-09-10 杭州哲信信息技术有限公司 A kind of voice-based Emotion identification method
CN110288980A (en) * 2019-06-17 2019-09-27 平安科技(深圳)有限公司 Audio recognition method, the training method of model, device, equipment and storage medium
CN110211563A (en) * 2019-06-19 2019-09-06 平安科技(深圳)有限公司 Chinese speech synthesis method, apparatus and storage medium towards scene and emotion
CN110211563B (en) * 2019-06-19 2024-05-24 平安科技(深圳)有限公司 Chinese speech synthesis method, device and storage medium for scenes and emotion
CN110400579A (en) * 2019-06-25 2019-11-01 华东理工大学 Based on direction from the speech emotion recognition of attention mechanism and two-way length network in short-term
CN110532380A (en) * 2019-07-12 2019-12-03 杭州电子科技大学 A kind of text sentiment classification method based on memory network
CN110890088A (en) * 2019-10-12 2020-03-17 中国平安财产保险股份有限公司 Voice information feedback method and device, computer equipment and storage medium
CN110890088B (en) * 2019-10-12 2022-07-15 中国平安财产保险股份有限公司 Voice information feedback method and device, computer equipment and storage medium
CN111357051A (en) * 2019-12-24 2020-06-30 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111357051B (en) * 2019-12-24 2024-02-02 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
WO2021127982A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech emotion recognition method, smart device, and computer-readable storage medium
CN111179945A (en) * 2019-12-31 2020-05-19 中国银行股份有限公司 Voiceprint recognition-based safety door control method and device
CN111276119A (en) * 2020-01-17 2020-06-12 平安科技(深圳)有限公司 Voice generation method and system and computer equipment
CN111276119B (en) * 2020-01-17 2023-08-22 平安科技(深圳)有限公司 Speech generation method, system and computer equipment
CN111341351A (en) * 2020-02-25 2020-06-26 厦门亿联网络技术股份有限公司 Voice activity detection method and device based on self-attention mechanism and storage medium
CN111429948A (en) * 2020-03-27 2020-07-17 南京工业大学 Voice emotion recognition model and method based on attention convolution neural network
CN111582382A (en) * 2020-05-09 2020-08-25 Oppo广东移动通信有限公司 State recognition method and device and electronic equipment
CN111582382B (en) * 2020-05-09 2023-10-31 Oppo广东移动通信有限公司 State identification method and device and electronic equipment
CN111832317B (en) * 2020-07-09 2023-08-18 广州市炎华网络科技有限公司 Intelligent information flow guiding method and device, computer equipment and readable storage medium
CN111816205B (en) * 2020-07-09 2023-06-20 中国人民解放军战略支援部队航天工程大学 Airplane audio-based intelligent recognition method for airplane models
CN111816205A (en) * 2020-07-09 2020-10-23 中国人民解放军战略支援部队航天工程大学 Airplane audio-based intelligent airplane type identification method
CN111832317A (en) * 2020-07-09 2020-10-27 平安普惠企业管理有限公司 Intelligent information diversion method and device, computer equipment and readable storage medium
CN111985231B (en) * 2020-08-07 2023-12-26 中移(杭州)信息技术有限公司 Unsupervised role recognition method and device, electronic equipment and storage medium
CN111985231A (en) * 2020-08-07 2020-11-24 中移(杭州)信息技术有限公司 Unsupervised role recognition method and device, electronic equipment and storage medium
WO2021189980A1 (en) * 2020-10-26 2021-09-30 平安科技(深圳)有限公司 Voice data generation method and apparatus, and computer device and storage medium
CN112331182A (en) * 2020-10-26 2021-02-05 平安科技(深圳)有限公司 Voice data generation method and device, computer equipment and storage medium
CN112163571B (en) * 2020-10-29 2024-03-05 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for identifying attribute of electronic equipment user
CN112163571A (en) * 2020-10-29 2021-01-01 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for identifying attribute of electronic equipment user
CN112466324A (en) * 2020-11-13 2021-03-09 上海听见信息科技有限公司 Emotion analysis method, system, equipment and readable storage medium
CN112992177B (en) * 2021-02-20 2023-10-17 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of voice style migration model
CN112992177A (en) * 2021-02-20 2021-06-18 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of voice style migration model
CN113053361A (en) * 2021-03-18 2021-06-29 北京金山云网络技术有限公司 Speech recognition method, model training method, device, equipment and medium
WO2022198923A1 (en) * 2021-03-26 2022-09-29 之江实验室 Speech emotion recognition method and system using fusion of crowd information
CN113270111A (en) * 2021-05-17 2021-08-17 广州国音智能科技有限公司 Height prediction method, device, equipment and medium based on audio data
CN113421594A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Speech emotion recognition method, device, equipment and storage medium
CN113421594B (en) * 2021-06-30 2023-09-22 平安科技(深圳)有限公司 Speech emotion recognition method, device, equipment and storage medium
CN113327631A (en) * 2021-07-15 2021-08-31 广州虎牙科技有限公司 Emotion recognition model training method, emotion recognition method and emotion recognition device
CN113327631B (en) * 2021-07-15 2023-03-21 广州虎牙科技有限公司 Emotion recognition model training method, emotion recognition method and emotion recognition device
CN113935336A (en) * 2021-10-09 2022-01-14 上海淇玥信息技术有限公司 Method and device for determining conversational strategy for voice conversation and electronic equipment
CN113889150A (en) * 2021-10-15 2022-01-04 北京工业大学 Speech emotion recognition method and device
CN113889150B (en) * 2021-10-15 2023-08-29 北京工业大学 Speech emotion recognition method and device
CN113889149B (en) * 2021-10-15 2023-08-29 北京工业大学 Speech emotion recognition method and device
CN113889149A (en) * 2021-10-15 2022-01-04 北京工业大学 Speech emotion recognition method and device
CN116916497B (en) * 2023-09-12 2023-12-26 深圳市卡能光电科技有限公司 Nested situation identification-based illumination control method and system for floor cylindrical atmosphere lamp
CN117648717A (en) * 2024-01-29 2024-03-05 知学云(北京)科技股份有限公司 Privacy protection method for artificial intelligent voice training
CN117648717B (en) * 2024-01-29 2024-05-03 知学云(北京)科技股份有限公司 Privacy protection method for artificial intelligent voice training

Also Published As

Publication number Publication date
WO2020173133A1 (en) 2020-09-03
CN109817246B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN109817246A (en) Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model
Yadav et al. Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN)
CN108597492B (en) Phoneme synthesizing method and device
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN109859772B (en) Emotion recognition method, emotion recognition device and computer-readable storage medium
CN111312245B (en) Voice response method, device and storage medium
CN108197115A (en) Intelligent interactive method, device, computer equipment and computer readable storage medium
CN112259106A (en) Voiceprint recognition method and device, storage medium and computer equipment
CN108899049A (en) A kind of speech-emotion recognition method and system based on convolutional neural networks
WO2021047319A1 (en) Voice-based personal credit assessment method and apparatus, terminal and storage medium
CN112216307B (en) Speech emotion recognition method and device
CN109313892A (en) Steady language identification method and system
WO2022178969A1 (en) Voice conversation data processing method and apparatus, and computer device and storage medium
Al-Dujaili et al. Speech emotion recognition: a comprehensive survey
Sethu et al. Speech based emotion recognition
Caponetti et al. Biologically inspired emotion recognition from speech
CN114127849A (en) Speech emotion recognition method and device
Ali et al. DWT features performance analysis for automatic speech recognition of Urdu
Yang et al. Algorithm for speech emotion recognition classification based on mel-frequency cepstral coefficients and broad learning system
Akinpelu et al. Lightweight deep learning framework for speech emotion recognition
CN114913859B (en) Voiceprint recognition method, voiceprint recognition device, electronic equipment and storage medium
Johar Paralinguistic profiling using speech recognition
CN116959464A (en) Training method of audio generation network, audio generation method and device
CN116416962A (en) Audio synthesis method, device, equipment and storage medium
Fonnegra et al. Speech emotion recognition based on a recurrent neural network classification model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant