CN105609117A - Device and method for identifying voice emotion - Google Patents
Device and method for identifying voice emotion Download PDFInfo
- Publication number
- CN105609117A CN105609117A CN201610091015.2A CN201610091015A CN105609117A CN 105609117 A CN105609117 A CN 105609117A CN 201610091015 A CN201610091015 A CN 201610091015A CN 105609117 A CN105609117 A CN 105609117A
- Authority
- CN
- China
- Prior art keywords
- voice
- gauss
- speech
- training
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Abstract
The invention discloses a device and a method for identifying voice emotion. The device comprises a training portion and an identification portion, wherein the training portion is used used for carrying out voice characteristic extraction for the to-be-pre-processed voice data, through characteristic extraction and Gaussian modeling, SVM classification for results acquired through Gaussian modeling is carried out, the identification portion is used for identifying an emotion state of voice and carrying out voice characteristic extraction for the to-be-identified voice, through characteristic selection, Gaussian likelihood score calculation is carried out, the calculation result is compared with the SVM classification result, and thereby the emotion category of the to-be-identified voice is acquired.
Description
Technical field
The present invention relates to field of voice signal, relate in particular to a kind of apparatus and method of identifying speech emotional.
Background technology
The recognition technology of speech emotional refers to that machine passes through the voice signal Intelligent Recognition mankind's different emotions state, in the obvious feature of the non-stationary feature geometric ratio of the voice signal under different emotions, judge the variation of mood according to people by extracting the acoustic feature such as tonequality feature, prosodic features and spectrum signature of voice. Speech emotional identification is the emerging field of the multidisciplinary intersections such as artificial intelligence, psychology and biology, its object is exactly, by computer technology, the emotion information lying in voice is identified to (in short same, expressed implication can be completely different in the time of different environment and affective state for speaker). Voice signal has good portability and gathers the advantages such as convenient, and therefore emotion recognition technology can be widely used in intelligent human-machine interaction, man-machine interaction teaching, show business, medical science, criminal investigation and security fields.
The evaluation of personnel's emotional state is had to very high using value, particularly in the Military Application fields such as Aero-Space, for a long time, uninteresting, high-intensity task can make related personnel face harsh physiology and psychology test, causes some negative moods. Inquire into negative emotions for mechanism of action and the influence factor of human cognitive activity, research improve individual cognition and operating efficiency method, avoid affecting the factor of cognition and ability to work, there is great practical significance.
Generally, the representation of the emotion correlation of voice can be realized by speaker model or acoustic model. Existing achievement in research shows, the feature adopting for emotion recognition is prosodic features mostly, and namely super segment5al feature, as fundamental tone, intensity, duration and their derivative parameter. But the information of speech quality sense of hearing aspect is also the factor that usually needs consideration.
Non-patent literature Alter, E.Tank, andS.Kotz, " AccentuationandEmotions-twodifferentsystems, " presentedatISCAWorkshop (ITRW) onSpeechandEmotion, Newcastle, NorthernIreland, the people such as 2000, Alter are by the research to relation between the rhythm and tonequality, find that pronunciation when angry and glad is different at the aspect such as breathe and hoarse. Other research shows, between the prosodic features of voice signal and three emotion dimensions (dimension of tiring, activate peacekeeping control dimension), there is certain relevance, wherein activate between peacekeeping prosodic features and there is obvious relation between persistence, activate the close affective state of dimension and there is similar prosodic features and easily obscure.
Summary of the invention
The object of invention is just to address the deficiencies of the prior art, and designs, studies a kind of apparatus and method of high performance identification speech emotional.
Technical scheme of the present invention is: a kind of device of identifying speech emotional, comprise, and training department, for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Identification part, for identifying the affective state of voice, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Further, described training department comprises, training utterance database, for training the speech data of emotion identification method, comprises the speech data of multiple affective style;
Pronunciation extracting module, for extracting the basic acoustic feature of each speech data of training utterance database, basic acoustic feature comprises the statistical nature of fundamental tone and single order thereof, second differnce, formant and statistical nature thereof, and MFCC feature and statistical nature thereof;
Feature selection module, combines any two kinds of affective styles, selects its acoustic feature, obtains training data;
Gauss's MBM, adopts gauss hybrid models modeling to training data, obtains data and distributes;
Svm classifier device, to each speech data in training utterance database, under the integrated mode of any two kinds of affective styles, obtains according to Gauss model the likelihood score that this speech data belongs to these two affective styles.
Further, described identification part comprises, characteristic extracting module, for extracting the basic acoustic feature of voice to be identified;
Select module, combine for the arbitrary two kinds of affective styles to voice to be identified, select its acoustic feature, obtain data to be identified;
Gauss's likelihood score computing module, treats identification data and carries out likelihood score calculating;
Emotion matching part, the likelihood score input svm classifier device for the treatment of identification data mates, and obtains the emotion classification of voice to be identified.
A method of identifying speech emotional, comprises the steps, training, and for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Identification, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Compared with prior art, advantage of the present invention is: adopt technical scheme of the present invention, precision is high, is not subject to the control of language languages, and processing speed is fast, can process in real time.
Brief description of the drawings
Below in conjunction with drawings and Examples, the invention will be further described:
Fig. 1 is the structured flowchart of identification speech emotional device;
Fig. 2 is training schematic diagram.
Fig. 3 is identification schematic diagram.
Detailed description of the invention
As shown in Figure 1, a kind of device of identifying speech emotional, comprises, training department and identification part, and training department is for carrying out speech feature extraction to pretreatment speech data, and by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Speech feature extraction, for identifying the affective state of voice, is carried out to voice to be identified in identification part, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Training department comprises, training utterance database, pronunciation extracting module, feature selection module, Gauss's MBM, svm classifier device.
As shown in Figure 2, training utterance database, for training the speech data of emotion identification method, comprises the speech data of multiple affective style. Suppose the total N kind of affective style that we will identify, in training utterance database, should comprise so the speech data of all these N kind affective styles. For example, if we will identify happiness, anger, sad, tranquil 4 kinds of affective styles, so, in training utterance storehouse, should comprise and these 4 kinds of speech datas that affective style is corresponding.
Pronunciation extracting module, for extracting the basic acoustic feature of each speech data of training utterance database, basic acoustic feature comprises the statistical nature of fundamental tone and single order thereof, second differnce, formant and statistical nature thereof, and MFCC feature and statistical nature thereof.
To each speech data, the acoustic feature of extraction forms the characteristic vector of a D dimension, the number F that wherein D is feature.
A) each speech data in training utterance database is processed, generated the characteristic vector of D dimension;
B) all features are normalized. By following formula, the feature on each dimension k (k=1...D) is normalized one by one:
In above formula, some dimensions of k representation feature vector, k=1...D. fkWithBe respectively before normalization and normalization after the numerical value of feature of k dimension. ak、bkRepresent minimum of a value and maximum on dimension k, minimum of a value and the maximum of k dimensional feature the acoustic feature vector extracting from all training utterances.
Feature selection module, combines any two kinds of affective styles, selects its acoustic feature, obtains training data; Gauss's MBM, adopts gauss hybrid models modeling to training data, obtains data and distributes.
Any two kinds of affective styles are combined, carry out the step of feature selecting and Gauss's modeling. If comprise N kind affective style in our classification task, so, the number of combination is exactly N (N-1)/2 kind, (Class1, type 2), (Class1, type 3) ..., (Class1, type N); (type 2, type 3), (type 2, type 4) ..., (type 2, type N) ..., (type N-1, type N). To each combination, the operation of carrying out is all the same, the data difference just adopting. Below, describe as an example of the combination (type 2, type 4) of i type and j type example:
A) first, choose acoustic feature vector that the training utterance corresponding with i type and j type obtain as training data;
B) calculate the differentiation s (d) of each dimension:
Wherein, and d (d=1 ..., D) d dimension of representation feature vector; () represents that i (j) plants the average of d dimension in the corresponding characteristic vector of affective style; () represents that i (j) plants the variance of d dimension in the corresponding characteristic vector of affective style. The value of the property distinguished s (d) is larger, shows that this feature is better to distinguishing this two kinds effect.
C) by the property distinguished, all D dimension is sorted, the highest individual dimension of M (for example M=10) of the value of selective discrimination is as the feature of affective style i and affective style j, i.e. M dimensional feature vector.
By this process, from original D dimension acoustic feature, select M element to form the characteristic vector that new M ties up. But to different affective style combinations, this M element of selecting is different. For example: for (Class1, type 2) and (type 2, type 3) combination, M corresponding element is different.
D) Gauss's modeling. The M dimensional feature vector that utilizes said process to obtain, for i class emotion and the corresponding all training datas of j class emotion, the data that adopt a gauss hybrid models to carry out such data of modeling distribute.
The gauss hybrid models corresponding with i class emotion, its likelihood function can represent by following form:
Here X is the characteristic vector of a M dimension; Bt (X) is member's density function; At is mixed weight-value, and T is for being mixed into mark. Each member's density function is the Gaussian function about mean value vector and covariance matrix of a M dimension variable, and form is as follows:
Said process (a)-(d) can simply be summarized as: for i kind affective style and j kind affective style, for each training utterance extracts a D dimension acoustic feature; Secondly, calculated and selected M characteristic dimension by the property distinguished, like this, D dimensional feature is converted into M dimensional feature; Finally, utilize Gaussian modeling, for each affective style builds a gauss hybrid models. Based on these two gauss hybrid models, we can obtain the likelihood score that a speech data belongs to classification i and j.
Svm classifier device, to each speech data in training utterance database, under the integrated mode of any two kinds of affective styles, obtains according to Gauss model the likelihood score that this speech data belongs to these two affective styles. To arbitrary speech data, generate N*(N-1) individual likelihood value, taking these likelihood values as feature, using the emotion category label of speech data as label, training svm classifier device.
Identification part comprises, characteristic extracting module is selected module, Gauss's likelihood score computing module, emotion matching part.
As shown in Figure 3, characteristic extracting module, for extracting the basic acoustic feature of voice to be identified. This step is identical with training process, from input voice, extracts N kind acoustic feature.
Select module, combine for the arbitrary two kinds of affective styles to voice to be identified, select its acoustic feature, obtain data to be identified. Select M feature: the combining form (as emotion in emotion and j in i) to each affective style, select M the highest acoustic feature of the differentiation corresponding with it.
Gauss's likelihood score computing module, treats identification data and carries out likelihood score calculating. Gauss's likelihood score calculates: the combining form (as emotion in emotion and j in i) to each affective style, according to the M of its a gauss hybrid models and selection acoustic feature, calculate the likelihood score that these voice belong to these two classifications.
Emotion matching part, the likelihood score input svm classifier device for the treatment of identification data mates, and obtains the emotion classification of voice to be identified. The value of all likelihood scores is combined into a vector (N*(N-1) dimension), input svm classifier device is classified, and obtains the emotion classification of voice to be identified.
A method of identifying speech emotional, comprises the steps, training, and for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Identification, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Compared with prior art, advantage of the present invention is: adopt technical scheme of the present invention, precision is high, is not subject to the control of language languages, and processing speed is fast, can process in real time.
Below be only concrete exemplary applications of the present invention, protection scope of the present invention is not constituted any limitation. In addition to the implementation, the present invention can also have other embodiment. All employings are equal to the technical scheme of replacement or equivalent transformation formation, within all dropping on the present invention's scope required for protection.
Claims (4)
1. a device of identifying speech emotional, is characterized in that: comprises,
Training department, for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling,The result that modeling obtains to Gauss is carried out svm classifier;
Identification part, for identifying the affective state of voice, carries out speech feature extraction to voice to be identified, by spyLevy selection, carry out the calculating of Gauss's likelihood score, result of calculation and svm classifier are contrasted, obtain waiting to knowThe emotion classification of other voice.
2. a kind of device of identifying speech emotional according to claim 1, is characterized in that: described training departmentComprise,
Training utterance database, for training the speech data of emotion identification method, comprises the language of multiple affective styleSound data;
Pronunciation extracting module, for extracting the basic acoustics spy of each speech data of training utterance databaseLevy, basic acoustic feature comprises statistical nature, formant and the statistics thereof of fundamental tone and single order thereof, second differnceFeature, and MFCC feature and statistical nature thereof;
Feature selection module, combines any two kinds of affective styles, selects its acoustic feature, obtains training numberAccording to;
Gauss's MBM, adopts gauss hybrid models modeling to training data, obtains data and distributes;
Svm classifier device, to each speech data in training utterance database, at any two kinds of affective stylesIntegrated mode under, obtain according to Gauss model the likelihood score that this speech data belongs to these two affective styles.
3. a kind of device of identifying speech emotional according to claim 1, is characterized in that: described identification partComprise,
Characteristic extracting module, for extracting the basic acoustic feature of voice to be identified;
Select module, combine for the arbitrary two kinds of affective styles to voice to be identified, select its acoustic feature,Obtain data to be identified;
Gauss's likelihood score computing module, treats identification data and carries out likelihood score calculating;
Emotion matching part, the likelihood score input svm classifier device for the treatment of identification data mates, and obtains to be identifiedThe emotion classification of voice.
4. a method of identifying speech emotional, is characterized in that: comprises the steps,
Training, for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling,The result that modeling obtains to Gauss is carried out svm classifier;
Identification, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score,Result of calculation and svm classifier are contrasted, obtain the emotion classification of voice to be identified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610091015.2A CN105609117A (en) | 2016-02-19 | 2016-02-19 | Device and method for identifying voice emotion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610091015.2A CN105609117A (en) | 2016-02-19 | 2016-02-19 | Device and method for identifying voice emotion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105609117A true CN105609117A (en) | 2016-05-25 |
Family
ID=55989000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610091015.2A Pending CN105609117A (en) | 2016-02-19 | 2016-02-19 | Device and method for identifying voice emotion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105609117A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN107516511A (en) * | 2016-06-13 | 2017-12-26 | 微软技术许可有限责任公司 | The Text To Speech learning system of intention assessment and mood |
CN107705807A (en) * | 2017-08-24 | 2018-02-16 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
CN109299777A (en) * | 2018-09-20 | 2019-02-01 | 于江 | A kind of data processing method and its system based on artificial intelligence |
CN109352666A (en) * | 2018-10-26 | 2019-02-19 | 广州华见智能科技有限公司 | It is a kind of based on machine talk dialogue emotion give vent to method and system |
CN110600033A (en) * | 2019-08-26 | 2019-12-20 | 北京大米科技有限公司 | Learning condition evaluation method and device, storage medium and electronic equipment |
CN113221933A (en) * | 2020-02-06 | 2021-08-06 | 本田技研工业株式会社 | Information processing apparatus, vehicle, computer-readable storage medium, and information processing method |
-
2016
- 2016-02-19 CN CN201610091015.2A patent/CN105609117A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516511A (en) * | 2016-06-13 | 2017-12-26 | 微软技术许可有限责任公司 | The Text To Speech learning system of intention assessment and mood |
US11238842B2 (en) | 2016-06-13 | 2022-02-01 | Microsoft Technology Licensing, Llc | Intent recognition and emotional text-to-speech learning |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN107705807A (en) * | 2017-08-24 | 2018-02-16 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
WO2019037382A1 (en) * | 2017-08-24 | 2019-02-28 | 平安科技(深圳)有限公司 | Emotion recognition-based voice quality inspection method and device, equipment and storage medium |
CN107705807B (en) * | 2017-08-24 | 2019-08-27 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
CN109299777A (en) * | 2018-09-20 | 2019-02-01 | 于江 | A kind of data processing method and its system based on artificial intelligence |
CN109299777B (en) * | 2018-09-20 | 2021-12-03 | 于江 | Data processing method and system based on artificial intelligence |
CN109352666A (en) * | 2018-10-26 | 2019-02-19 | 广州华见智能科技有限公司 | It is a kind of based on machine talk dialogue emotion give vent to method and system |
CN110600033A (en) * | 2019-08-26 | 2019-12-20 | 北京大米科技有限公司 | Learning condition evaluation method and device, storage medium and electronic equipment |
CN110600033B (en) * | 2019-08-26 | 2022-04-05 | 北京大米科技有限公司 | Learning condition evaluation method and device, storage medium and electronic equipment |
CN113221933A (en) * | 2020-02-06 | 2021-08-06 | 本田技研工业株式会社 | Information processing apparatus, vehicle, computer-readable storage medium, and information processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105609117A (en) | Device and method for identifying voice emotion | |
CN107578775B (en) | Multi-classification voice method based on deep neural network | |
Noroozi et al. | Vocal-based emotion recognition using random forests and decision tree | |
Pao et al. | Mandarin emotional speech recognition based on SVM and NN | |
Kumar et al. | Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance. | |
Zhang et al. | Spectrogram-frame linear network and continuous frame sequence for bird sound classification | |
Li et al. | Speech emotion recognition using 1d cnn with no attention | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN106847279A (en) | Man-machine interaction method based on robot operating system ROS | |
Fulmare et al. | Understanding and estimation of emotional expression using acoustic analysis of natural speech | |
Chen et al. | Mandarin emotion recognition combining acoustic and emotional point information | |
Hema et al. | Emotional speech recognition using cnn and deep learning techniques | |
Peng et al. | Environment sound classification based on visual multi-feature fusion and GRU-AWS | |
Lalitha et al. | Emotion detection using perceptual based speech features | |
Wu et al. | Environmental sound classification via time–frequency attention and framewise self-attention-based deep neural networks | |
Praksah et al. | Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier | |
Soliman et al. | Isolated word speech recognition using convolutional neural network | |
Nasim et al. | Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning | |
Gumelar et al. | Forward feature selection for toxic speech classification using support vector machine and random forest | |
Dewa | Javanese vowels sound classification with convolutional neural network | |
Lanjewar et al. | Speech emotion recognition: a review | |
Harimi et al. | Anger or joy? Emotion recognition using nonlinear dynamics of speech | |
Zbancioc et al. | A study about the automatic recognition of the anxiety emotional state using Emo-DB | |
Trabelsi et al. | Improved frame level features and SVM supervectors approach for the recogniton of emotional states from speech: Application to categorical and dimensional states | |
Liu et al. | Emotional feature selection of speaker-independent speech based on correlation analysis and fisher |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
DD01 | Delivery of document by public notice |
Addressee: Zheng Hongliang Document name: Notification of Publication of the Application for Invention |
|
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160525 |
|
WD01 | Invention patent application deemed withdrawn after publication |