CN105609117A - Device and method for identifying voice emotion - Google Patents

Device and method for identifying voice emotion Download PDF

Info

Publication number
CN105609117A
CN105609117A CN201610091015.2A CN201610091015A CN105609117A CN 105609117 A CN105609117 A CN 105609117A CN 201610091015 A CN201610091015 A CN 201610091015A CN 105609117 A CN105609117 A CN 105609117A
Authority
CN
China
Prior art keywords
voice
gauss
speech
training
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610091015.2A
Other languages
Chinese (zh)
Inventor
郑洪亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610091015.2A priority Critical patent/CN105609117A/en
Publication of CN105609117A publication Critical patent/CN105609117A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The invention discloses a device and a method for identifying voice emotion. The device comprises a training portion and an identification portion, wherein the training portion is used used for carrying out voice characteristic extraction for the to-be-pre-processed voice data, through characteristic extraction and Gaussian modeling, SVM classification for results acquired through Gaussian modeling is carried out, the identification portion is used for identifying an emotion state of voice and carrying out voice characteristic extraction for the to-be-identified voice, through characteristic selection, Gaussian likelihood score calculation is carried out, the calculation result is compared with the SVM classification result, and thereby the emotion category of the to-be-identified voice is acquired.

Description

A kind of apparatus and method of identifying speech emotional
Technical field
The present invention relates to field of voice signal, relate in particular to a kind of apparatus and method of identifying speech emotional.
Background technology
The recognition technology of speech emotional refers to that machine passes through the voice signal Intelligent Recognition mankind's different emotions state, in the obvious feature of the non-stationary feature geometric ratio of the voice signal under different emotions, judge the variation of mood according to people by extracting the acoustic feature such as tonequality feature, prosodic features and spectrum signature of voice. Speech emotional identification is the emerging field of the multidisciplinary intersections such as artificial intelligence, psychology and biology, its object is exactly, by computer technology, the emotion information lying in voice is identified to (in short same, expressed implication can be completely different in the time of different environment and affective state for speaker). Voice signal has good portability and gathers the advantages such as convenient, and therefore emotion recognition technology can be widely used in intelligent human-machine interaction, man-machine interaction teaching, show business, medical science, criminal investigation and security fields.
The evaluation of personnel's emotional state is had to very high using value, particularly in the Military Application fields such as Aero-Space, for a long time, uninteresting, high-intensity task can make related personnel face harsh physiology and psychology test, causes some negative moods. Inquire into negative emotions for mechanism of action and the influence factor of human cognitive activity, research improve individual cognition and operating efficiency method, avoid affecting the factor of cognition and ability to work, there is great practical significance.
Generally, the representation of the emotion correlation of voice can be realized by speaker model or acoustic model. Existing achievement in research shows, the feature adopting for emotion recognition is prosodic features mostly, and namely super segment5al feature, as fundamental tone, intensity, duration and their derivative parameter. But the information of speech quality sense of hearing aspect is also the factor that usually needs consideration.
Non-patent literature Alter, E.Tank, andS.Kotz, " AccentuationandEmotions-twodifferentsystems, " presentedatISCAWorkshop (ITRW) onSpeechandEmotion, Newcastle, NorthernIreland, the people such as 2000, Alter are by the research to relation between the rhythm and tonequality, find that pronunciation when angry and glad is different at the aspect such as breathe and hoarse. Other research shows, between the prosodic features of voice signal and three emotion dimensions (dimension of tiring, activate peacekeeping control dimension), there is certain relevance, wherein activate between peacekeeping prosodic features and there is obvious relation between persistence, activate the close affective state of dimension and there is similar prosodic features and easily obscure.
Summary of the invention
The object of invention is just to address the deficiencies of the prior art, and designs, studies a kind of apparatus and method of high performance identification speech emotional.
Technical scheme of the present invention is: a kind of device of identifying speech emotional, comprise, and training department, for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Identification part, for identifying the affective state of voice, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Further, described training department comprises, training utterance database, for training the speech data of emotion identification method, comprises the speech data of multiple affective style;
Pronunciation extracting module, for extracting the basic acoustic feature of each speech data of training utterance database, basic acoustic feature comprises the statistical nature of fundamental tone and single order thereof, second differnce, formant and statistical nature thereof, and MFCC feature and statistical nature thereof;
Feature selection module, combines any two kinds of affective styles, selects its acoustic feature, obtains training data;
Gauss's MBM, adopts gauss hybrid models modeling to training data, obtains data and distributes;
Svm classifier device, to each speech data in training utterance database, under the integrated mode of any two kinds of affective styles, obtains according to Gauss model the likelihood score that this speech data belongs to these two affective styles.
Further, described identification part comprises, characteristic extracting module, for extracting the basic acoustic feature of voice to be identified;
Select module, combine for the arbitrary two kinds of affective styles to voice to be identified, select its acoustic feature, obtain data to be identified;
Gauss's likelihood score computing module, treats identification data and carries out likelihood score calculating;
Emotion matching part, the likelihood score input svm classifier device for the treatment of identification data mates, and obtains the emotion classification of voice to be identified.
A method of identifying speech emotional, comprises the steps, training, and for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Identification, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Compared with prior art, advantage of the present invention is: adopt technical scheme of the present invention, precision is high, is not subject to the control of language languages, and processing speed is fast, can process in real time.
Brief description of the drawings
Below in conjunction with drawings and Examples, the invention will be further described:
Fig. 1 is the structured flowchart of identification speech emotional device;
Fig. 2 is training schematic diagram.
Fig. 3 is identification schematic diagram.
Detailed description of the invention
As shown in Figure 1, a kind of device of identifying speech emotional, comprises, training department and identification part, and training department is for carrying out speech feature extraction to pretreatment speech data, and by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Speech feature extraction, for identifying the affective state of voice, is carried out to voice to be identified in identification part, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Training department comprises, training utterance database, pronunciation extracting module, feature selection module, Gauss's MBM, svm classifier device.
As shown in Figure 2, training utterance database, for training the speech data of emotion identification method, comprises the speech data of multiple affective style. Suppose the total N kind of affective style that we will identify, in training utterance database, should comprise so the speech data of all these N kind affective styles. For example, if we will identify happiness, anger, sad, tranquil 4 kinds of affective styles, so, in training utterance storehouse, should comprise and these 4 kinds of speech datas that affective style is corresponding.
Pronunciation extracting module, for extracting the basic acoustic feature of each speech data of training utterance database, basic acoustic feature comprises the statistical nature of fundamental tone and single order thereof, second differnce, formant and statistical nature thereof, and MFCC feature and statistical nature thereof.
To each speech data, the acoustic feature of extraction forms the characteristic vector of a D dimension, the number F that wherein D is feature.
A) each speech data in training utterance database is processed, generated the characteristic vector of D dimension;
B) all features are normalized. By following formula, the feature on each dimension k (k=1...D) is normalized one by one:
f ~ k = f k - a k b k - a k , k = 1... D
In above formula, some dimensions of k representation feature vector, k=1...D. fkWithBe respectively before normalization and normalization after the numerical value of feature of k dimension. ak、bkRepresent minimum of a value and maximum on dimension k, minimum of a value and the maximum of k dimensional feature the acoustic feature vector extracting from all training utterances.
Feature selection module, combines any two kinds of affective styles, selects its acoustic feature, obtains training data; Gauss's MBM, adopts gauss hybrid models modeling to training data, obtains data and distributes.
Any two kinds of affective styles are combined, carry out the step of feature selecting and Gauss's modeling. If comprise N kind affective style in our classification task, so, the number of combination is exactly N (N-1)/2 kind, (Class1, type 2), (Class1, type 3) ..., (Class1, type N); (type 2, type 3), (type 2, type 4) ..., (type 2, type N) ..., (type N-1, type N). To each combination, the operation of carrying out is all the same, the data difference just adopting. Below, describe as an example of the combination (type 2, type 4) of i type and j type example:
A) first, choose acoustic feature vector that the training utterance corresponding with i type and j type obtain as training data;
B) calculate the differentiation s (d) of each dimension:
s ( d ) = ( μ i d - μ j d ) 2 ( σ i d ) 2 + ( σ j d ) 2
Wherein, and d (d=1 ..., D) d dimension of representation feature vector; () represents that i (j) plants the average of d dimension in the corresponding characteristic vector of affective style; () represents that i (j) plants the variance of d dimension in the corresponding characteristic vector of affective style. The value of the property distinguished s (d) is larger, shows that this feature is better to distinguishing this two kinds effect.
C) by the property distinguished, all D dimension is sorted, the highest individual dimension of M (for example M=10) of the value of selective discrimination is as the feature of affective style i and affective style j, i.e. M dimensional feature vector.
By this process, from original D dimension acoustic feature, select M element to form the characteristic vector that new M ties up. But to different affective style combinations, this M element of selecting is different. For example: for (Class1, type 2) and (type 2, type 3) combination, M corresponding element is different.
D) Gauss's modeling. The M dimensional feature vector that utilizes said process to obtain, for i class emotion and the corresponding all training datas of j class emotion, the data that adopt a gauss hybrid models to carry out such data of modeling distribute.
The gauss hybrid models corresponding with i class emotion, its likelihood function can represent by following form:
p ( X | λ i ) = Σ t = 1 T a t b t ( X )
Here X is the characteristic vector of a M dimension; Bt (X) is member's density function; At is mixed weight-value, and T is for being mixed into mark. Each member's density function is the Gaussian function about mean value vector and covariance matrix of a M dimension variable, and form is as follows:
b t ( X ) = 1 ( 2 π ) M / 2 | Σ t | 1 / 2 exp ( - 1 2 ( X - μ t ) ′ Σ t - 1 ( X - μ t ) )
Said process (a)-(d) can simply be summarized as: for i kind affective style and j kind affective style, for each training utterance extracts a D dimension acoustic feature; Secondly, calculated and selected M characteristic dimension by the property distinguished, like this, D dimensional feature is converted into M dimensional feature; Finally, utilize Gaussian modeling, for each affective style builds a gauss hybrid models. Based on these two gauss hybrid models, we can obtain the likelihood score that a speech data belongs to classification i and j.
Svm classifier device, to each speech data in training utterance database, under the integrated mode of any two kinds of affective styles, obtains according to Gauss model the likelihood score that this speech data belongs to these two affective styles. To arbitrary speech data, generate N*(N-1) individual likelihood value, taking these likelihood values as feature, using the emotion category label of speech data as label, training svm classifier device.
Identification part comprises, characteristic extracting module is selected module, Gauss's likelihood score computing module, emotion matching part.
As shown in Figure 3, characteristic extracting module, for extracting the basic acoustic feature of voice to be identified. This step is identical with training process, from input voice, extracts N kind acoustic feature.
Select module, combine for the arbitrary two kinds of affective styles to voice to be identified, select its acoustic feature, obtain data to be identified. Select M feature: the combining form (as emotion in emotion and j in i) to each affective style, select M the highest acoustic feature of the differentiation corresponding with it.
Gauss's likelihood score computing module, treats identification data and carries out likelihood score calculating. Gauss's likelihood score calculates: the combining form (as emotion in emotion and j in i) to each affective style, according to the M of its a gauss hybrid models and selection acoustic feature, calculate the likelihood score that these voice belong to these two classifications.
Emotion matching part, the likelihood score input svm classifier device for the treatment of identification data mates, and obtains the emotion classification of voice to be identified. The value of all likelihood scores is combined into a vector (N*(N-1) dimension), input svm classifier device is classified, and obtains the emotion classification of voice to be identified.
A method of identifying speech emotional, comprises the steps, training, and for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling, the result that modeling obtains to Gauss is carried out svm classifier;
Identification, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score, and result of calculation and svm classifier are contrasted, and obtains the emotion classification of voice to be identified.
Compared with prior art, advantage of the present invention is: adopt technical scheme of the present invention, precision is high, is not subject to the control of language languages, and processing speed is fast, can process in real time.
Below be only concrete exemplary applications of the present invention, protection scope of the present invention is not constituted any limitation. In addition to the implementation, the present invention can also have other embodiment. All employings are equal to the technical scheme of replacement or equivalent transformation formation, within all dropping on the present invention's scope required for protection.

Claims (4)

1. a device of identifying speech emotional, is characterized in that: comprises,
Training department, for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling,The result that modeling obtains to Gauss is carried out svm classifier;
Identification part, for identifying the affective state of voice, carries out speech feature extraction to voice to be identified, by spyLevy selection, carry out the calculating of Gauss's likelihood score, result of calculation and svm classifier are contrasted, obtain waiting to knowThe emotion classification of other voice.
2. a kind of device of identifying speech emotional according to claim 1, is characterized in that: described training departmentComprise,
Training utterance database, for training the speech data of emotion identification method, comprises the language of multiple affective styleSound data;
Pronunciation extracting module, for extracting the basic acoustics spy of each speech data of training utterance databaseLevy, basic acoustic feature comprises statistical nature, formant and the statistics thereof of fundamental tone and single order thereof, second differnceFeature, and MFCC feature and statistical nature thereof;
Feature selection module, combines any two kinds of affective styles, selects its acoustic feature, obtains training numberAccording to;
Gauss's MBM, adopts gauss hybrid models modeling to training data, obtains data and distributes;
Svm classifier device, to each speech data in training utterance database, at any two kinds of affective stylesIntegrated mode under, obtain according to Gauss model the likelihood score that this speech data belongs to these two affective styles.
3. a kind of device of identifying speech emotional according to claim 1, is characterized in that: described identification partComprise,
Characteristic extracting module, for extracting the basic acoustic feature of voice to be identified;
Select module, combine for the arbitrary two kinds of affective styles to voice to be identified, select its acoustic feature,Obtain data to be identified;
Gauss's likelihood score computing module, treats identification data and carries out likelihood score calculating;
Emotion matching part, the likelihood score input svm classifier device for the treatment of identification data mates, and obtains to be identifiedThe emotion classification of voice.
4. a method of identifying speech emotional, is characterized in that: comprises the steps,
Training, for pretreatment speech data is carried out to speech feature extraction, by feature extraction and Gauss's modeling,The result that modeling obtains to Gauss is carried out svm classifier;
Identification, carries out speech feature extraction to voice to be identified, by feature selecting, carries out the calculating of Gauss's likelihood score,Result of calculation and svm classifier are contrasted, obtain the emotion classification of voice to be identified.
CN201610091015.2A 2016-02-19 2016-02-19 Device and method for identifying voice emotion Pending CN105609117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610091015.2A CN105609117A (en) 2016-02-19 2016-02-19 Device and method for identifying voice emotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610091015.2A CN105609117A (en) 2016-02-19 2016-02-19 Device and method for identifying voice emotion

Publications (1)

Publication Number Publication Date
CN105609117A true CN105609117A (en) 2016-05-25

Family

ID=55989000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610091015.2A Pending CN105609117A (en) 2016-02-19 2016-02-19 Device and method for identifying voice emotion

Country Status (1)

Country Link
CN (1) CN105609117A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN107516511A (en) * 2016-06-13 2017-12-26 微软技术许可有限责任公司 The Text To Speech learning system of intention assessment and mood
CN107705807A (en) * 2017-08-24 2018-02-16 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification
CN109299777A (en) * 2018-09-20 2019-02-01 于江 A kind of data processing method and its system based on artificial intelligence
CN109352666A (en) * 2018-10-26 2019-02-19 广州华见智能科技有限公司 It is a kind of based on machine talk dialogue emotion give vent to method and system
CN110600033A (en) * 2019-08-26 2019-12-20 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN113221933A (en) * 2020-02-06 2021-08-06 本田技研工业株式会社 Information processing apparatus, vehicle, computer-readable storage medium, and information processing method

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516511A (en) * 2016-06-13 2017-12-26 微软技术许可有限责任公司 The Text To Speech learning system of intention assessment and mood
US11238842B2 (en) 2016-06-13 2022-02-01 Microsoft Technology Licensing, Llc Intent recognition and emotional text-to-speech learning
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN107705807A (en) * 2017-08-24 2018-02-16 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification
WO2019037382A1 (en) * 2017-08-24 2019-02-28 平安科技(深圳)有限公司 Emotion recognition-based voice quality inspection method and device, equipment and storage medium
CN107705807B (en) * 2017-08-24 2019-08-27 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification
CN109299777A (en) * 2018-09-20 2019-02-01 于江 A kind of data processing method and its system based on artificial intelligence
CN109299777B (en) * 2018-09-20 2021-12-03 于江 Data processing method and system based on artificial intelligence
CN109352666A (en) * 2018-10-26 2019-02-19 广州华见智能科技有限公司 It is a kind of based on machine talk dialogue emotion give vent to method and system
CN110600033A (en) * 2019-08-26 2019-12-20 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN110600033B (en) * 2019-08-26 2022-04-05 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN113221933A (en) * 2020-02-06 2021-08-06 本田技研工业株式会社 Information processing apparatus, vehicle, computer-readable storage medium, and information processing method

Similar Documents

Publication Publication Date Title
CN105609117A (en) Device and method for identifying voice emotion
CN107578775B (en) Multi-classification voice method based on deep neural network
Noroozi et al. Vocal-based emotion recognition using random forests and decision tree
Pao et al. Mandarin emotional speech recognition based on SVM and NN
Kumar et al. Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance.
Zhang et al. Spectrogram-frame linear network and continuous frame sequence for bird sound classification
Li et al. Speech emotion recognition using 1d cnn with no attention
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN106847279A (en) Man-machine interaction method based on robot operating system ROS
Fulmare et al. Understanding and estimation of emotional expression using acoustic analysis of natural speech
Chen et al. Mandarin emotion recognition combining acoustic and emotional point information
Hema et al. Emotional speech recognition using cnn and deep learning techniques
Peng et al. Environment sound classification based on visual multi-feature fusion and GRU-AWS
Lalitha et al. Emotion detection using perceptual based speech features
Wu et al. Environmental sound classification via time–frequency attention and framewise self-attention-based deep neural networks
Praksah et al. Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier
Soliman et al. Isolated word speech recognition using convolutional neural network
Nasim et al. Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning
Gumelar et al. Forward feature selection for toxic speech classification using support vector machine and random forest
Dewa Javanese vowels sound classification with convolutional neural network
Lanjewar et al. Speech emotion recognition: a review
Harimi et al. Anger or joy? Emotion recognition using nonlinear dynamics of speech
Zbancioc et al. A study about the automatic recognition of the anxiety emotional state using Emo-DB
Trabelsi et al. Improved frame level features and SVM supervectors approach for the recogniton of emotional states from speech: Application to categorical and dimensional states
Liu et al. Emotional feature selection of speaker-independent speech based on correlation analysis and fisher

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
DD01 Delivery of document by public notice

Addressee: Zheng Hongliang

Document name: Notification of Publication of the Application for Invention

WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160525

WD01 Invention patent application deemed withdrawn after publication