CN109785863A - A speech emotion recognition method and system based on deep belief network - Google Patents
A speech emotion recognition method and system based on deep belief network Download PDFInfo
- Publication number
- CN109785863A CN109785863A CN201910173690.3A CN201910173690A CN109785863A CN 109785863 A CN109785863 A CN 109785863A CN 201910173690 A CN201910173690 A CN 201910173690A CN 109785863 A CN109785863 A CN 109785863A
- Authority
- CN
- China
- Prior art keywords
- speech
- belief network
- voice signal
- emotion recognition
- deep belief
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012706 support-vector machine Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims description 29
- 230000008451 emotion Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 230000002996 emotional effect Effects 0.000 abstract description 20
- 238000013528 artificial neural network Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 239000013598 vector Substances 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses the speech-emotion recognition method and system of a kind of deepness belief network.The recognition methods includes: acquisition voice signal;The voice signal is pre-processed, pretreatment voice signal is obtained;Unsupervised speech recognition is carried out using deepness belief network to the pretreatment voice signal, obtains phonic signal character;The identification classification that the phonic signal character is carried out to speech emotional using support vector machines, obtains speech emotion recognition result.Using the multi-categorizer model based on deepness belief network and limitation Boltzmann machine, the multi-classifier system of a speech emotion recognition is established, the discrimination of speech emotional is improved.
Description
Technical field
The present invention relates to field of speech recognition, a kind of speech-emotion recognition method more particularly to deepness belief network and
System.
Background technique
With cloud computing, the development of mobile Internet, big data, machine is that the mankind service intelligence further, people and machine
For the dream to be engaged in the dialogue with natural language gradually close to realizing, requirement of the people to machine interaction capabilities is also higher and higher.Simply
The identification of voice content be no longer satisfied the requirement of people, handle, identify and understand the emotion in voice in practical application
In become particularly important.Language emotion recognition has boundless application prospect, it can be applied not only to man-machine friendship
Mutual system, can be also used for speech recognition, enhance the robustness of speech recognition;Or it is used for speaker identification, improve speaker
Discrimination rate.Speech emotion recognition technology intelligent human-machine interaction, human-computer interaction teaching, be widely used.Automatic language feelings
The other research of perception, be not only able to push computer technology further development, it also by substantially increase people work and
Learning efficiency improves people's lives quality.
Extraneous various emotion signals are sampled to identify various emotions, in terms of deep neural network research, for
The accuracy of emotional semantic classification is low, in terms of pattern-recognition, using the feelings in the prior art based in neural network extraction voice
Sense, it is lower for the discrimination of sad, excited, happy and angry emotion, using adaptive neural network to speech emotional state
Discrimination it is relatively low.
Using traditional neural network in training, each layer of network is trained together as a whole, when facing big data
When situation, the training time of network just will increase, the convergence rate of network is made to become slower.Back-propagation algorithm is neural network
Most commonly used method in training trains entire neural network by the method for iteration, and network parameter is using the side being randomized
Formula is initialized, and adjusts net using the difference of the actual value of the output valve and data that currently calculate network top obtained
The parameter of each layer of network, using traditional gradient descent method, the target of undated parameter be so that neural network forecast value and true value more
It is close, still, come initialization network parameter by the way of random initializtion, it will lead to error correction more down when network updates
Signal is weaker, and gradient also becomes more sparse, so that network is easily trapped into local optimum.So leading to the knowledge of speech emotional state
Not rate is low.
Summary of the invention
The object of the present invention is to provide a kind of speech emotionals of deepness belief network that can be improved speech emotion recognition rate
Recognition methods and system.
To achieve the above object, the present invention provides following schemes:
A kind of speech-emotion recognition method of deepness belief network, which is characterized in that the recognition methods includes:
Obtain voice signal;
The voice signal is pre-processed, pretreatment voice signal is obtained;
Unsupervised speech recognition is carried out using deepness belief network to the pretreatment voice signal, is obtained
Phonic signal character;
The identification classification that the phonic signal character is carried out to speech emotional using support vector machines, obtains speech emotional and knows
Other result.
Optionally, described that unsupervised voice signal spy is carried out using deepness belief network to the pretreatment voice signal
Sign is extracted, and is obtained phonic signal character and is specifically included:
Low layer to high-rise N layer limitation Boltzmann machine is stacked, deepness belief network is obtained;
Limitation Boltzmann machine according to the pretreatment voice signal to i-th layer carries out unsupervised training, obtains i-th most
Excellent parameter, the optimized parameter for the limitation Boltzmann machine that i-th optimized parameter is described i-th layer;Wherein, the value of i is successively
It is 1,2 ..., N;
The limitation Boltzmann machine of i+1 layer is carried out according to i-th optimized parameter and the pretreatment voice signal
Unsupervised training obtains i+1 optimized parameter;
The multiple optimized parameter is finely tuned using the method for overall situation training to the deepness belief network and converges to the overall situation
It is optimal, obtain multiple fine tuning optimized parameters;
The phonic signal character of the pretreatment voice signal is extracted according to the fine tuning optimized parameter.
Optionally, described that the phonic signal character is classified using the identification that support vector machines carries out speech emotional, it obtains
Speech emotion recognition result is obtained to specifically include:
The sample point of the phonic signal character is mapped to by high-dimensional feature space using kernel function, obtaining spatial linear can
The sample divided;
The sample that the support vector machines can divide according to the spatial linear carries out logic to the phonic signal character and sentences
It is disconnected, obtain speech emotion recognition result.
A kind of speech emotion recognition system of deepness belief network, the identifying system include:
Voice signal obtains module, for obtaining voice signal;
Speech signal pre-processing module obtains pretreatment voice signal for pre-processing the voice signal;
Characteristic extracting module, for carrying out unsupervised voice using deepness belief network to the pretreatment voice signal
Signal characteristic abstraction obtains phonic signal character;
Emotion recognition module, for the phonic signal character to be carried out to the identification point of speech emotional using support vector machines
Class obtains speech emotion recognition result.
Optionally, the characteristic extracting module specifically includes:
Deepness belief network establishes unit, for stacking low layer to high-rise N layer limitation Boltzmann machine, obtains depth
Belief network;
Supervised training unit carries out nothing for the limitation Boltzmann machine according to the pretreatment voice signal to i-th layer
Supervised training obtains the i-th optimized parameter, the optimal ginseng for the limitation Boltzmann machine that i-th optimized parameter is described i-th layer
Number;Wherein, the value of i is followed successively by 1,2 ..., N;According to i-th optimized parameter and the pretreatment voice signal to
I+1 layers of limitation Boltzmann machine carries out unsupervised training, obtains i+1 optimized parameter;
Small parameter perturbations unit is believed for being finely tuned the multiple optimized parameter using the method for overall situation training to the depth
Network convergence is read to global optimum, obtains multiple fine tuning optimized parameters;
Speech recognition unit, for extracting the pretreatment voice signal according to the fine tuning optimized parameter
Phonic signal character.
Optionally, the emotion recognition module specifically includes:
Kernel function unit, for the sample point of the phonic signal character to be mapped to high dimensional feature sky using kernel function
Between, obtain the sample that spatial linear can divide;
Logic judgment unit believes the voice according to the sample that the spatial linear can divide for the support vector machines
Number feature carries out logic judgment, obtains speech emotion recognition result.
The specific embodiment provided according to the present invention, the invention discloses following technical effects: the invention discloses one kind
The speech-emotion recognition method and system of deepness belief network.The recognition methods includes: acquisition voice signal;Described in pretreatment
Voice signal obtains pretreatment voice signal;Deepness belief network is used to carry out the pretreatment voice signal unsupervised
Speech recognition obtains phonic signal character;The phonic signal character is subjected to voice feelings using support vector machines
The identification of sense is classified, and speech emotion recognition result is obtained.Each limitation Bohr is successively trained hereby using the deepness belief network
The mode of graceful machine entirely trains the entire deepness belief network to reach training, using based on the deepness belief network and institute
The multi-categorizer model for stating limitation Boltzmann machine, establishes the multi-classifier system of a speech emotion recognition, improves language
The discrimination of sound emotion.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow chart of the speech-emotion recognition method of deepness belief network provided by the invention;
Fig. 2 is the structure composition figure of the speech emotion recognition system of deepness belief network provided by the invention;
Fig. 3 is the feelings identifying system block diagram provided by the invention based on support vector machines.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of speech emotionals of deepness belief network that can be improved speech emotion recognition rate
Recognition methods and system.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
As shown in Figure 1, a kind of speech-emotion recognition method of deepness belief network, which is characterized in that the recognition methods
Include:
Step 100: obtaining voice signal;
Step 200: pre-processing the voice signal, obtain pretreatment voice signal;
Step 300: unsupervised phonic signal character is carried out using deepness belief network to the pretreatment voice signal
It extracts, obtains phonic signal character;
Step 400: the phonic signal character being classified using the identification that support vector machines carries out speech emotional, obtains language
Sound emotion recognition result.
The step 300: unsupervised voice signal is carried out using deepness belief network to the pretreatment voice signal
Feature extraction obtains phonic signal character and specifically includes:
Low layer to high-rise N layer limitation Boltzmann machine is stacked, deepness belief network is obtained;
Limitation Boltzmann machine according to the pretreatment voice signal to i-th layer carries out unsupervised training, obtains i-th most
Excellent parameter, the optimized parameter for the limitation Boltzmann machine that i-th optimized parameter is described i-th layer;Wherein, the value of i is successively
It is 1,2 ..., N;
The limitation Boltzmann machine of i+1 layer is carried out according to i-th optimized parameter and the pretreatment voice signal
Unsupervised training obtains i+1 optimized parameter;
The multiple optimized parameter is finely tuned using the method for overall situation training to the deepness belief network and converges to the overall situation
It is optimal, obtain multiple fine tuning optimized parameters;
The phonic signal character of the pretreatment voice signal is extracted according to the fine tuning optimized parameter.
The step 400: the phonic signal character is classified using the identification that support vector machines carries out speech emotional, is obtained
Speech emotion recognition result is obtained to specifically include:
The sample point of the phonic signal character is mapped to by high-dimensional feature space using kernel function, obtaining spatial linear can
The sample divided;
The sample that the support vector machines can divide according to the spatial linear carries out logic to the phonic signal character and sentences
It is disconnected, obtain speech emotion recognition result.
As shown in Fig. 2, a kind of speech emotion recognition system of deepness belief network, the identifying system include:
Voice signal obtains module 1, for obtaining voice signal;
Speech signal pre-processing module 2 obtains pretreatment voice signal for pre-processing the voice signal;
Characteristic extracting module 3, for carrying out unsupervised language using deepness belief network to the pretreatment voice signal
Sound signal feature extraction obtains phonic signal character;
Emotion recognition module 4, for the phonic signal character to be carried out to the identification of speech emotional using support vector machines
Classification obtains speech emotion recognition result.
The characteristic extracting module 3 specifically includes:
Deepness belief network establishes unit, for stacking low layer to high-rise N layer limitation Boltzmann machine, obtains depth
Belief network;
Supervised training unit carries out nothing for the limitation Boltzmann machine according to the pretreatment voice signal to i-th layer
Supervised training obtains the i-th optimized parameter, the optimal ginseng for the limitation Boltzmann machine that i-th optimized parameter is described i-th layer
Number;Wherein, the value of i is followed successively by 1,2 ..., N;According to i-th optimized parameter and the pretreatment voice signal to
I+1 layers of limitation Boltzmann machine carries out unsupervised training, obtains i+1 optimized parameter;
Small parameter perturbations unit is believed for being finely tuned the multiple optimized parameter using the method for overall situation training to the depth
Network convergence is read to global optimum, obtains multiple fine tuning optimized parameters;
Speech recognition unit, for extracting the pretreatment voice signal according to the fine tuning optimized parameter
Phonic signal character.
The emotion recognition module 4 specifically includes:
Kernel function unit, for the sample point of the phonic signal character to be mapped to high dimensional feature sky using kernel function
Between, obtain the sample that spatial linear can divide;
Logic judgment unit believes the voice according to the sample that the spatial linear can divide for the support vector machines
Number feature carries out logic judgment, obtains speech emotion recognition result.
After the multidimensional characteristic vectors for extracting the affective characteristics in voice signal by deepness belief network, one is needed to be suitble to
Emotion classifiers.This method is using support vector machines using one-to-one mode to four kinds of emotions (surprised, glad, angry, sad)
Classify.Deepness belief network is extracted to the multidimensional characteristic vectors of the affective characteristics in voice signal as support vector machines
The sample point of input feature vector is mapped to the Nonlinear separability problem of speech emotional using kernel function by the input of classifier
High-dimensional feature space, so that corresponding sample space linear separability.Feelings identifying system block diagram such as Fig. 3 institute based on support vector machines
Show.
" one-to-one " mode is to construct hyperplane to any two kinds of emotions, needs to train k* (k-1)/2 sub-classifier.It is whole
A training process needs altogetherA support vector machines sub-classifier, i.e., 6.Each sub-classifier is by surprised, glad, anger
Any two kinds of training in anger, sad four kinds of affective characteristics form.That is: glad-indignation, glad-sad, glad-surprised, anger
Anger-sadness, indignation-is surprised, sad-surprised.One classifier of training between every two class is carried out when to a unknown speech emotional
When classification, each classifier carries out its classification to judge and for corresponding classification " throwing a upper ticket ", last who gets the most votes's class
It Ji not be as the classification of the unknown emotion.Decision phase uses ballot method, it is understood that there may be the identical situation of the poll of multiple classes, from
And make unknown sample while belonging to multiple classifications, influence nicety of grading.
It is both needed to before support vector machine classifier training and identification as one label of every emotional speech Design of Signal, to
Indicate emotional category belonging to this emotional speech signal.The type of label must be set as dimorphism.During emotion recognition, together
When feature vector is input in all support vector machines, the output of each support vector machines was selected most later by logical decision
Possible emotional category, finally using the emotion of weight highest (poll is most) as the affective state of voice signal to be identified, energy
Access recognition result.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910173690.3A CN109785863A (en) | 2019-02-28 | 2019-02-28 | A speech emotion recognition method and system based on deep belief network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910173690.3A CN109785863A (en) | 2019-02-28 | 2019-02-28 | A speech emotion recognition method and system based on deep belief network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109785863A true CN109785863A (en) | 2019-05-21 |
Family
ID=66486177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910173690.3A Pending CN109785863A (en) | 2019-02-28 | 2019-02-28 | A speech emotion recognition method and system based on deep belief network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109785863A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110619893A (en) * | 2019-09-02 | 2019-12-27 | 合肥工业大学 | Time-frequency feature extraction and artificial intelligence emotion monitoring method of voice signal |
CN112687294A (en) * | 2020-12-21 | 2021-04-20 | 重庆科技学院 | Vehicle-mounted noise identification method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101561651B1 (en) * | 2014-05-23 | 2015-11-02 | 서강대학교산학협력단 | Interest detecting method and apparatus based feature data of voice signal using Deep Belief Network, recording medium recording program of the method |
CN106297825A (en) * | 2016-07-25 | 2017-01-04 | 华南理工大学 | A kind of speech-emotion recognition method based on integrated degree of depth belief network |
CN107092895A (en) * | 2017-05-09 | 2017-08-25 | 重庆邮电大学 | A kind of multi-modal emotion identification method based on depth belief network |
CN108717856A (en) * | 2018-06-16 | 2018-10-30 | 台州学院 | A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network |
CN109036468A (en) * | 2018-11-06 | 2018-12-18 | 渤海大学 | Speech-emotion recognition method based on deepness belief network and the non-linear PSVM of core |
-
2019
- 2019-02-28 CN CN201910173690.3A patent/CN109785863A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101561651B1 (en) * | 2014-05-23 | 2015-11-02 | 서강대학교산학협력단 | Interest detecting method and apparatus based feature data of voice signal using Deep Belief Network, recording medium recording program of the method |
CN106297825A (en) * | 2016-07-25 | 2017-01-04 | 华南理工大学 | A kind of speech-emotion recognition method based on integrated degree of depth belief network |
CN107092895A (en) * | 2017-05-09 | 2017-08-25 | 重庆邮电大学 | A kind of multi-modal emotion identification method based on depth belief network |
CN108717856A (en) * | 2018-06-16 | 2018-10-30 | 台州学院 | A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network |
CN109036468A (en) * | 2018-11-06 | 2018-12-18 | 渤海大学 | Speech-emotion recognition method based on deepness belief network and the non-linear PSVM of core |
Non-Patent Citations (2)
Title |
---|
黄晨晨等: "基于深度信念网络的语音情感识别的研究", 《计算机研究与发展》 * |
黄驹斌: "基于深度信念网络的语音情感识别", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110619893A (en) * | 2019-09-02 | 2019-12-27 | 合肥工业大学 | Time-frequency feature extraction and artificial intelligence emotion monitoring method of voice signal |
CN112687294A (en) * | 2020-12-21 | 2021-04-20 | 重庆科技学院 | Vehicle-mounted noise identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kim et al. | Towards speech emotion recognition" in the wild" using aggregated corpora and deep multi-task learning | |
CN105956560B (en) | A kind of model recognizing method based on the multiple dimensioned depth convolution feature of pondization | |
CN108717856B (en) | A speech emotion recognition method based on multi-scale deep convolutional neural network | |
CN110136690A (en) | Phoneme synthesizing method, device and computer readable storage medium | |
CN108171318B (en) | Convolution neural network integration method based on simulated annealing-Gaussian function | |
CN108804453A (en) | A kind of video and audio recognition methods and device | |
Kurpukdee et al. | Speech emotion recognition using convolutional long short-term memory neural network and support vector machines | |
CN106875007A (en) | End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection | |
Atkar et al. | Speech emotion recognition using dialogue emotion decoder and CNN Classifier | |
US20220121949A1 (en) | Personalized neural network pruning | |
CN113763965B (en) | A speaker recognition method based on fusion of multiple attention features | |
CN104077598B (en) | A kind of emotion identification method based on voice fuzzy cluster | |
Reddy et al. | Handwritten Hindi character recognition using deep learning techniques | |
Shinde et al. | Real time two way communication approach for hearing impaired and dumb person based on image processing | |
CN115146057A (en) | Supply chain ecological region image-text fusion emotion recognition method based on interactive attention | |
CN113628640A (en) | Cross-library speech emotion recognition method based on sample equalization and maximum mean difference | |
Fu et al. | An adversarial training based speech emotion classifier with isolated Gaussian regularization | |
CN109785863A (en) | A speech emotion recognition method and system based on deep belief network | |
CN117711443A (en) | Lightweight speech emotion recognition method and system based on multi-scale attention | |
Shareef et al. | A review: isolated Arabic words recognition using artificial intelligent techniques | |
Sun et al. | A Novel Convolutional Neural Network Voiceprint Recognition Method Based on Improved Pooling Method and Dropout Idea. | |
Duduka et al. | A neural network approach to accent classification | |
Pham et al. | Vietnamese scene text detection and recognition using deep learning: an empirical study | |
Saranya et al. | AI based speech recognition of literacy to improve tribal English knowledge | |
Trabelsi et al. | Improved frame level features and SVM supervectors approach for the recogniton of emotional states from speech: Application to categorical and dimensional states |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190521 |
|
RJ01 | Rejection of invention patent application after publication |