CN101937678A - Judgment-deniable automatic speech emotion recognition method for fidget - Google Patents

Judgment-deniable automatic speech emotion recognition method for fidget Download PDF

Info

Publication number
CN101937678A
CN101937678A CN2010102305114A CN201010230511A CN101937678A CN 101937678 A CN101937678 A CN 101937678A CN 2010102305114 A CN2010102305114 A CN 2010102305114A CN 201010230511 A CN201010230511 A CN 201010230511A CN 101937678 A CN101937678 A CN 101937678A
Authority
CN
China
Prior art keywords
emotion
sample
irritated
lambda
judgment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010102305114A
Other languages
Chinese (zh)
Inventor
赵力
黄程韦
邹采荣
余华
王开
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN2010102305114A priority Critical patent/CN101937678A/en
Publication of CN101937678A publication Critical patent/CN101937678A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a judgment-deniable automatic speech emotion recognition method for fidget, which comprises the following steps of: (1) establishing a fidget speech database; (2) extracting speech emotion characteristics; (3) performing characteristic selection, namely, performing characteristic estimation by adopting fisher criterion; and (4), performing the judgment-deniable recognition method, wherein the judgment-deniable recognition method specifically comprises the following steps of: (4-1) modeling the three emotions of fidget, joy and calmness by adopting GMM in a way that each motion corresponds to a GMM model, and performing judgment by using maximum a posteriori; and (4-2) measuring a matching degree between a sample and a corresponding emotion type by adopting a likelihood ratio fuzzy entropy-based judgment denial method, thereby realizing the judgment denial of the sample of an unknown type. The method has the advantage of relatively higher fidget recognition performance.

Description

A kind of at irritated mood can be according to the automatic speech emotion identification method of declaring
Technical field
The present invention relates to a kind of audio recognition method, particularly a kind of at irritated mood can be according to the automatic speech emotion identification method of declaring.
Background technology
In artificial intelligence, emotion is calculated and is considered to give computing machine a critical path higher, comprehensive intelligence.In man-machine interaction, give the emotion ability of computing machine personification, make its energy perception environment and atmosphere on every side, self-adaptation provides the interactive environment of the most comfortable, eliminates the obstacle between people and the machine as far as possible, has become the target of next generation computer development.The method of speech emotional recognition technology application mode identification extracts speaker's affective state information from voice signal, thereby make computing machine recognizing voice emotion automatically, being the pith that emotion is calculated, is an important foundation of nature man-machine interaction.
Before the present invention, the research of speech emotional identification mainly concentrates on several emotions that basic emotion class another matter middle finger goes out, and comprises happiness, indignation, surprised, sadness and fear etc., but lacks research for the speech emotional that agitation etc. acquires a special sense.Present speech-emotion recognition method can not be discerned preferably to irritated mood.Identification to irritated emotional state has very high using value, particularly in Military Application fields such as Aero-Space, for a long time, uninteresting, high-intensity task can make the related personnel face harsh physiology and psychology test, cause agitation and wait some negative moods.After irritated mood occurs, if imappropriate processing can cause great influence to personnel's ability to work, even cause that artificial carelessness causes accident.Negative emotions such as discussion agitation are studied the method that improves individual cognition and work efficiency, the factor of avoiding influencing cognition and ability to work for the mechanism of action and the influence factor of human cognitive activity, have great practical significance.
In the speech emotional Study of recognition, be faced with the problem of emotion language material validity at present.Emotion language material data by the mode of performance is gathered are called the performance language material.Present most speech emotional Study of recognition is based on the performance language material.The advantage of performance language material is easy collection, and shortcoming is emotion performance exaggeration, with the practical natural voice certain difference is arranged, and the reliability that therefore causes performing data is relatively poor.Set up the emotion recognition system based on performance emotion language material,, caused the decline of recognition performance under physical condition because the data that are used for the model of cognition training have certain difference with actual data.Emotion language material by the method collection brought out is called and brings out language material.The characteristics of bringing out language material are that naturalness is higher, and are convenient to the language material that psychologic by experiment method control obtains needed particular emotion.Before the present invention, in the Chinese speech emotion recognition, also there is not the corpus that brings out of irritated mood.
Human emotion has ambiguity and diversity, in speech emotional identification, traditional recognition methods be with the sample that occurs rigid be divided into a certain class in the known class, yet when in reality, having more ambiguous emotion sample, the confidence level of classification is just relatively poor, and the probability of erroneous judgement is just higher.
Summary of the invention
The object of the invention is to fill up speech emotional recognition technology blank part in actual applications, provide a kind of at irritated mood can be according to the automatic speech emotion identification method of declaring.
The present invention adopts following technical scheme for achieving the above object:
A kind of can comprise the steps: of the present invention at irritated mood according to the automatic speech emotion identification method of declaring
(1) sets up irritated speech database;
(2) extract the speech emotional feature;
(3) feature selecting
(3-1) adopt the fisher criterion to carry out characteristic evaluating: the fisher criterion is represented with formula (1):
f ( d ) = 1 C m 2 &Sigma; 0 < i < j < m ( &mu; id - &mu; jd ) 2 &sigma; id 2 + &sigma; jd 2 - - - ( 1 )
Wherein μ is the average of eigenwert, and σ is the standard deviation of eigenwert, and m is the sum of classification, and d is a dimension.
(4) can be according to the recognition methods of declaring
(4-1) adopt GMM that irritated, happy and tranquil three kinds of emotions are carried out modeling, every kind of corresponding GMM model of emotion is adjudicated by maximum posteriori criterion: x iRepresent i bar statement sample, λ jRepresent j emotion classification, maximum a posteriori probability is expressed as:
p ( &lambda; j | x i ) = p ( x i | &lambda; j ) P ( &lambda; j ) P ( x i ) - - - ( 2 )
Sample judgement to be identified is:
j * = arg max j p ( x i | &lambda; j ) - - - ( 3 )
Wherein, j *Classification under the expression sample;
(4-2) adopt a kind of method of declaring of refusing that the matching degree between sample and the emotion classification is measured based on the likelihood probability fuzzy entropy, thus realization to unknown classification sample refuse declare:
To GMM models irritated, happy and tranquil three kinds of emotion classifications, can obtain 3 GMM likelihood probability density values, respectively the matching degree of representative sample and three emotion classifications; It is big more that the high more expression sample of fuzzy entropy of the judgement set that the likelihood probability density value constitutes belongs to uncertain degree irritated, happy and tranquil three kinds of emotions, then refuses to declare when fuzzy entropy surpasses certain threshold value Th:
1 C &Sigma; j = 1 C arctan ( p ( x i | &lambda; j ) / 10 ) ( ln ( &pi; / 2 ) - ln arctan ( p ( x i | &lambda; j ) / 10 ) ) > Th - - - ( 4 ) .
Advantage of the present invention and effect are:
(1) based on the irritated mood in the automatic recognizing voice of voice signal.
(2) gather the language material of irritated mood by the method for bringing out, make data, thereby obtain the recognition performance of irritated mood preferably more near the real emotion data.
(3) adopt the speech-emotion recognition method that can refuse to declare, for uncertain or unknown emotion sample, sorter provides the recognition result that refusal is judged, does not promptly belong to any class in the practical speech emotional classification of needs detection.
Other advantages of the present invention and effect will continue to describe below.
Description of drawings
The two-dimentional dimensional space illustraton of model of Fig. 1---emotion.
Fig. 2---mapping function figure.
Fig. 3---the sample distribution figure in the prosodic features space.
Fig. 4---the sample distribution figure in the tonequality feature space.
Sample distribution figure in Fig. 5---the rhythm and the tonequality feature space.
Fig. 6---the average figure of preceding 5 features.
Fig. 7---the variogram of preceding 5 features.
Embodiment
Below in conjunction with drawings and Examples, technical solutions according to the invention are further elaborated.
1. the foundation of irritated speech database
In experimental psychology, carry out the stimulation of visually-perceptible and sense of hearing perception aspect by Computer Multimedia Technology, occur along with development of computer in recent years, adopt more a kind of laboratory facilities.Computer game can provide a man-machine interaction environment interaction, that have strong appeal by vision, the acoustic stimuli of picture and music, can effectively induce tested personnel's front and negative emotion.Particularly when recreation was won in succession, tested personnel were because the success in the game virtual scene with satisfied, is induced happy emotion; When recreation was failed continuously, tested personnel suffered setbacks in virtual scene, caused easily to comprise irritated negative emotion.In the experimentation that carries out the long period, the game operation and the failure of repeatability can be brought out irritated emotion smoothly.
(1) tested selection
Select ten university students (five male sex, five women) to carry out the voice collecting that computer game is brought out, before recreation, carry out recording of tranquil voice.
(2) design of statement text
A main application fields considering practical speech emotional identification such as agitation is the assessment of the negative emotions that caused of long-term Aeronautics and Astronautics and navigation task, " standard marine communication term " that 20 tendentious working words short sentences of ameleia are selected from International Maritime Organization (IMO) (IMO) issue (SMCP).
(3) You Xi selection
For the ease of bringing out irritated emotion, we have selected for use needs patient and careful computing machine trivial games.Tested personnel require to move a bead with mouse in the recreation, bead need be by that twine, narrow pipeline, bead can not be run into tube wall in by the process of pipeline, otherwise " bomb " will explode, failure game, smoothly by behind the pipeline, " bomb disposal " success is played and is won at the appointed time.
(4) record happy emotion language material
After each recreation triumph, require tested personnel to say the text sentence content that to record with happy emotion.
(5) record irritated emotion language material
Behind each failure game, require tested personnel to say the text sentence content that to record with irritated emotion.
(6) the subjective record of experiencing
Tested subjectivity of filling in mood is experienced, and comprises five selections: very irritated, a bit agitation, intermediateness, a bit happy, unusual happiness.
(7) screening of data
The mode of bringing out by computer game obtains original emotional speeches irritated, happy and tranquil three kinds of emotions and amounts to 1800, listens tin to distinguish and to filter out 1321 of the higher statements of quality, as the final speech emotional data of bringing out.
2. speech emotional feature extraction
The dimensional space opinion of emotion is thought, human all moods all are made up of several dimensional space, specific emotional state can only represent one from get close to shrink back or continuous space from happy to misery the position, be not independently between the different moods, but it is continuous, can realize gradually, change stably, similarity between the therefore different moods and otherness can be represented according to the distance in dimensional space each other.Nearest two during the last ten years, the dimension of emotion has been subjected to the attention of Many researchers, and the most well accepted emotion dimension model is the joyful degree-activity model among Fig. 1.The implication of joyful degree and two dimensional space of activity is as follows: joyful degree (Hedonic tone) or title degree of tiring (Valence), its theoretical foundation is that the separation of positive negative-morality activates, mainly being presented as the subjective mood impression of emotion main body, is a kind of tolerance to mood and subjective relationship; Activity (Activation) or claim degree of waking up (Arousal), the degree that the human body energy that refers to interrelate with affective state activates is a kind of tolerance to the inherent energy of mood.Residing position is as shown in Figure 2 in dimensional space for the agitation of discerning among the present invention, happiness and tranquil three kinds of affective states." agitation " is meant the psychological condition of tested unhappiness, irritability; " happiness " refers to tested happiness, happy, positive, positive psychological condition.Irritated and the happy two ends that are positioned at joyful degree, agitation is presented as the negative emotion of emotion main body, happiness is presented as the positive emotion of emotion main body.On activity, irritated and happiness all is positioned at forward space zone, and the energy activated degree of body was all higher when these two kinds of emotions occurred.
The quality of affective characteristics has very important influence to the quality of the final recognition effect of emotion, how to extract and select to reflect the phonetic feature of emotion variation, is one of present speech emotional identification field sixty-four dollar question.Generally the most basic affective characteristics of Cai Yonging is a prosodic features, as fundamental frequency, short-time energy, pronunciation duration, word speed etc.These prosodic features can reflect speaker's partial feeling information really, go up largely and can distinguish different basic emotion classifications, and the algorithm of parameter extraction is comparatively ripe.Yet three-dimensional model: " excitation dimension (Arousal)-dimension of tiring (Valence)-control dimension (Power) " according to emotion, traditional prosodic features has only reflected the information of " excitation dimension " in the emotion three-dimensional model, thereby only uses prosodic features can not distinguish all emotion classifications well.Tonequality feature in the voice signal, not only in close relations with " dimension of tiring " of emotion, and can partly reflect the information of " control is tieed up " in the three-dimensional dimension model.Therefore, in order to discern irritated emotional speech better, the extraction of affective characteristics not only comprises the prosodic features parameter among the present invention, and comprises the tonequality characteristic parameter of voice.Used 74 global statistics features in the present invention, wherein preceding 36 are characterized as prosodic features, and back 38 are characterized as the tonequality feature.
(1) structure of rhythm affective characteristics
Feature 1-10: the average of short-time energy and difference thereof, maximal value, minimum value, intermediate value, variance;
Feature 11-25: the average of fundamental tone and single order thereof, second order difference, maximal value, minimum value, intermediate value, variance; Feature 26: fundamental tone scope;
Feature 27-36: the ratio of ratio, pronunciation number of regions and the overall area number of the ratio of ratio, pronunciation frame number and the totalframes of pronunciation frame number, mute frame number, mute frame number and pronunciation frame number, pronunciation number of regions, mute number of regions, pronunciation number of regions and mute number of regions, long hair sound number of regions, the longest mute number of regions;
(2) structure of tonequality affective characteristics
Feature 37-66: the average of first, second, third resonance peak and first order difference thereof, maximal value, minimum value, intermediate value, variance;
The following spectrum energy number percent of feature 67-69:250Hz, the following spectrum energy number percent of 650Hz, the above spectrum energy number percent of 4kHz.
Feature 70-74: harmonic noise is than average, maximal value, minimum value, intermediate value, the variance of (HNR).
3. feature selecting
The selection of affective characteristics is one of the most valued problem in the speech emotional identification always, and to the evaluation of the quality of a feature, we consider two aspects: the numerical value of feature, and the variance of feature.Take all factors into consideration the factor of these two aspects, adopt the fisher criterion to carry out characteristic evaluating.To d dimension, the fisher criterion can be represented with formula (1):
f ( d ) = ( &mu; 1 d - &mu; 2 d ) 2 &sigma; 1 d 2 + &sigma; 2 d 2 - - - ( 1 )
μ wherein 1d, μ 2dBe the average of the eigenwert of two classifications of d dimension,
Figure BSA00000196126500062
Figure BSA00000196126500063
It is the variance of the eigenwert of two classifications of d dimension.The fisher criterion is big more, shows that this feature is good more to distinguishing this two kinds effect.For the situation of multiclass, formula (1) can be rewritten as:
f ( d ) = 1 C m 2 &Sigma; 0 < i < j < m ( &mu; id - &mu; jd ) 2 &sigma; id 2 + &sigma; jd 2 - - - ( 2 )
Wherein m is the sum of classification.According to the fisher criterion, preceding ten best features that irritated, happy, tranquil three kinds of emotions are selected are:
(1) number percent of the following spectrum energy of 250Hz
(2) average of fundamental tone first order difference
(3) average of fundamental tone
The intermediate value of (4) first resonance peaks
The minimum value of (5) first resonance peaks
(6) variance of short-time energy
The average of (7) second resonance peaks
(8) ratio of pronunciation frame number and totalframes
(9) harmonic noise compares average
(10) number percent of the following spectrum energy of 650Hz
4. can be according to the recognition methods of declaring
The distribution of emotion sample in feature space can be described with the stack of a plurality of Gaussian functions.In theory, as long as mix enough gaussian component, gauss hybrid models (GMM) can match probability density function arbitrarily.Adopt GMM that irritated, happy and tranquil three kinds of emotions are carried out modeling in the present invention, every kind of corresponding GMM model of emotion is adjudicated by maximum posteriori criterion.x iRepresent i bar statement sample, λ jExpression emotion classification, maximum a posteriori probability can be expressed as:
p ( &lambda; j | x i ) = p ( x i | &lambda; j ) P ( &lambda; j ) P ( x i ) - - - ( 3 )
P (x wherein i| λ j) obtain by the GMM model of each emotion.For given statement sample, the probability that eigenvector occurs is a constant, supposes every kind of equiprobable appearance of emotion, and C is an emotion classification number.
P ( &lambda; j ) = 1 C , 1 &le; i &le; C - - - ( 4 )
So, sample to be identified can be adjudicated and is,
j * = arg max j p ( x i | &lambda; j ) - - - ( 5 )
Wherein, j *Classification under the expression sample.
At the fuzzy and uncertainty of emotion under the actual environment, the diversity of practical speech emotional kind is necessary to study the practical speech emotional method of the identification that can refuse to declare.Below we will adopt a kind of based on the likelihood probability fuzzy entropy refuse to declare method, adopt fuzzy entropy to come the matching degree between sample and the emotion classification is measured, thus realization to unknown classification sample refuse declare.When sample to be identified arrives,, obtain C GMM likelihood probability density value, be mapped to the degree of membership μ that belongs to j emotion classification between 0 to 1 as i sample with GMM likelihood probability density value respectively by the GMM model of C kind emotion j(x i):
&mu; j ( x i ) = arctan ( p ( x i | &lambda; j ) / 10 ) &pi; / 2 - - - ( 6 )
Wherein the projection function of Cai Yonging is,
y = arctan ( x / 10 ) &pi; / 2 - - - ( 7 )
Functional digraph as shown in Figure 2
Fuzzy set E for all possible composition of sample of j emotion classification j={ x 1, x 2..., x n, its degree of membership is respectively μ j(x 1), μ j(x 2) ..., μ j(x n), make that its fuzzy entropy is e (μ j(x i)), e (μ then j(x i)) should satisfy:
1) e (μ j(x i)) with μ j(x i) increase and reduce;
2) work as μ j(x i) be 1 o'clock, e (μ j(x i)) be 0;
3) two independently the entropy of fuzzy set should satisfy additive property.
Here additive property is the condition of a strictness, has only the additive property of satisfying, and the entropy of fuzzy set could uniquely be determined.For the fuzzy set E of two emotion classifications independently j, E k, it amasss and is:
E jE k:μ jk(x i)=μ j(x ik(x i) (8)
Set E jE kEntropy be defined as:
e(μ jk(x i))=e(μ j(x i))+e(μ k(x i)) (9)
Be similar to the proof of entropy at random, the expression formula that can be met the fuzzy entropy of top three conditions is:
e(μ j(x i))=-Klnμ j(x i) (10)
Wherein K is the number greater than 0.Formula (6) substitution is got, and the fuzzy entropy that i sample belongs to j emotion classification is,
e(μ j(x i))=-K(ln?arctan(p(x ij)/10)-ln(π/2)) (11)
The average blur entropy that the judgement that C likelihood probability value of i sample to be identified constituted is gathered is evaluated as,
S ( x i ) = 1 C &Sigma; j = 1 C &mu; j ( x i ) e ( &mu; j ( x i ) ) - - - ( 12 )
Formula (11) substitution is had,
S ( x i ) = - 2 K &pi;C &Sigma; j = 1 C arctan ( p ( x i | &lambda; j ) / 10 ) ( ln arctan ( p ( x i | &lambda; j ) / 10 ) - ln ( &pi; / 2 ) ) - - - ( 13 )
To GMM models irritated, happy and tranquil three kinds of emotion classifications, can obtain 3 GMM likelihood probability density values, respectively the matching degree of representative sample and three emotion classifications.It is big more that the high more expression sample of fuzzy entropy of the judgement set that the likelihood probability density value constitutes belongs to uncertain degree irritated, happy and tranquil three kinds of emotions, then refuses to declare when fuzzy entropy surpasses certain threshold value Th, and constant K is got pi/2.
S(x i)>Th (14)
With formula (13) substitution promptly,
1 C &Sigma; j = 1 C arctan ( p ( x i | &lambda; j ) / 10 ) ( ln ( &pi; / 2 ) - ln arctan ( p ( x i | &lambda; j ) / 10 ) ) > Th - - - ( 15 )
Wherein Th is fuzzy entropy threshold value definite in the experiment.Choosing of threshold value should guarantee that target emotion classification to be identified obtains correct identification, takes into account unknown uncertain emotion sample again and obtains refusing to declare.Fuzzy threshold setting low excessively then refuses to declare DeGrain to uncertain sample.Blur the too high of threshold setting, that then refuses to declare is too much, can make system's average recognition rate reduce.Need refuse to declare when known emotion model distance is far away when the part sample, refuse simultaneously to declare and also can make some test sample book correctly not discerned.So should guarantee that three classifications such as agitation, happiness, calmness can obtain to regulate the fuzzy entropy threshold value under the prerequisite of satisfied discrimination.When average recognition rate took place significantly to descend, the threshold value of this moment was the upper limit.The fuzzy entropy threshold value is made as 0.1 in the experiment.
5. system performance analysis
(1) sample distribution of feature space
Prosodic features correlativity main and activity is bigger, and the correlativity of tonequality feature and joyful degree is bigger.In 74 acoustic features that extract among the present invention, preceding 36 are characterized as prosodic features, and back 38 are characterized as the tonequality feature.We carry out the KL conversion respectively with prosodic features and tonequality feature, and the feature space analysis on joyful degree and the activity is carried out in the distribution of use PCA method.Preceding two dimensions that intercept PCA after the conversion respectively constitute two-dimensional feature space, shown in agitation, calmness and the distribution as shown in Figure 3 and Figure 4 of happy three class samples in the two-dimensional space that constitutes, the two-dimentional PCA space that Fig. 3 constitutes for prosodic features, the two-dimentional PCA space that Fig. 4 constitutes for the tonequality feature.
We can see that when only using prosodic features, tranquil and all the other two kinds of emotions can make a distinction preferably, and this is because agitation and happy all higher on activity is bigger with the distance of tranquil emotion.Yet irritated and happy two kinds of emotions, difference is less relatively on activity, and when only using prosodic features, the sample distribution region overlapping of two kinds of emotions is more in the two-dimensional feature space in Fig. 3.The theory that this is main with prosodic features and the activation dimension is corresponding is consistent.Therefore when irritated, the happy and tranquil three kinds of emotions of identification, only use traditional prosodic features can not well reach the effect of classification, need to extract with joyful and spend corresponding acoustic feature.
On joyful degree, irritated and happy have bigger distance, lays respectively at the positive and negative two ends of joyful degree; Opposite tranquility is between agitation and happiness, and distance is less on each comfortable joyful degree with them.We can see in Fig. 4, after using the tonequality feature, distribution distance in two-dimensional feature space between the irritated and happy sample is bigger, can obtain the property distinguished preferably, though the sample of tranquil emotion and bigger obscuring arranged between them, the big emotion of distance is effectively on irritated and happy two kinds of joyful degree but the use of tonequality feature is to distinguishing, and illustrates that the correlativity of tonequality feature and joyful degree is bigger.
We notice in the feature space of two dimension before PCA simultaneously, the isolated sample of minority is arranged, the appearance of inferring isolated sample is because the complicacy of the expression pattern of actual emotion causes, and illustrates that practical speech emotional special expression pattern may occur under different environment.These isolated samples are very little to the contribution of training emotion model in the present invention, therefore can be subjected to certain restriction to being modeled on the generalization of actual emotion.
Comprehensive 74 prosodic features and the tonequality feature used, the feature space that constitutes by preceding two dimensions of PCA method as shown in Figure 5, we can see that the sample distribution of agitation, happiness and tranquil three kinds of emotions has obtained distinguishing preferably, therefore adopts prosodic features and tonequality feature to distinguish preferably practical speech emotionals such as agitations in the dimensional space of emotion.
(2) average of best features and variance
Average and variance such as Fig. 6 and shown in Figure 7 after irritated, happy, tranquil preceding 5 best features normalization.Can see that from statistics feature selected among the present invention can distinguish agitation, happiness and tranquil three kinds of affective states preferably.The following spectrum energy number percent of 250Hz has embodied the energy of low frequency region, voice under tranquility value on this feature is higher, irritated, happy value is lower, the value of wherein irritated emotion correspondence is minimum, illustrate that the following spectrum energy of the 250Hz in the voice signal can decrease when irritated emotion occurring.The first order difference of fundamental tone has embodied the speed that fundamental frequency changes, and irritated with happy fundamental tone changes greatly, and tranquility is less relatively.The fundamental tone characteristics of mean can be distinguished three class emotions equally preferably, and on this feature, irritated and happy value is bigger, and the value during tranquility is less.On the intermediate value and minimum value feature of resonance peak, three kinds of emotions have obtained preferably equally to be distinguished, and the value of tranquility is less relatively.
(3) recognition result
Carry out testing with the emotion recognition of the irrelevant text-independent of speaker, every kind of emotion is randomly drawed 400, is divided into two groups, one group of 300 galley proof originally carries out the training of GMM emotion model, and three kinds of emotions amount to 900, another organizes 100 samples, is used to test discrimination, and three kinds of emotions amount to 300.In bringing out the raw tone of sound bank, distinguish that by listening the disallowable emotion statement of experiment has 479, these statements are considered to the lower data of emotion degree of membership, choose minimum 100 of degree of membership wherein, as uncertain unknown emotion classification sample, be used to refuse to declare test.Adopt preceding ten characteristic dimension of PCA method and preceding ten best features of best features group selection method respectively, the practical speech-emotion recognition method that use can be refused to declare, discriminations irritated, happy and tranquil three kinds of emotions are tested, and the test recognition result as shown in Table 1 and Table 2.
Table 1PCA method recognition result
Figure BSA00000196126500111
Table 2 best features group recognition result
Figure BSA00000196126500112
We can see among Fig. 1 and Fig. 2, and the uncertain sample that about 60 percent emotional expression is fuzzy is refused to declare, and the rejecting of the sample that this a part of sorter can't be judged has comparatively effectively reduced the generation of judging by accident.In experimental result, the discrimination of PCA method and two kinds of methods of best features group selection is suitable substantially, and the average recognition rate of PCA method is 77.0%, and the average recognition rate of best features group selection method is 75.7%.
The scope that the present invention asks for protection is not limited only to the description of this embodiment.

Claims (1)

1. can comprise the steps: at irritated mood according to the automatic speech emotion identification method of declaring
(1) sets up irritated speech database;
(2) extract the speech emotional feature;
It is characterized in that also comprising the steps:
(3) feature selecting
(3-1) adopt the fisher criterion to carry out characteristic evaluating: the fisher criterion is represented with formula (1):
f ( d ) = 1 C m 2 &Sigma; 0 < i < j < m ( &mu; id - &mu; jd ) 2 &sigma; id 2 + &sigma; jd 2 - - - ( 1 )
Wherein μ is the average of eigenwert, and σ is the standard deviation of eigenwert, and m is the sum of classification, and d is a dimension.
(4) can be according to the recognition methods of declaring
(4-1) adopt GMM that irritated, happy and tranquil three kinds of emotions are carried out modeling, every kind of corresponding GMM model of emotion is adjudicated by maximum posteriori criterion: x iRepresent i bar statement sample, λ jRepresent j emotion classification, maximum a posteriori probability is expressed as:
p ( &lambda; j | x i ) = p ( x i | &lambda; j ) P ( &lambda; j ) P ( x i ) - - - ( 2 )
Sample judgement to be identified is:
j * = arg max j p ( x i | &lambda; j ) - - - ( 3 )
Wherein, j *Classification under the expression sample;
(4-2) adopt a kind of method of declaring of refusing that the matching degree between sample and the emotion classification is measured based on the likelihood probability fuzzy entropy, thus realization to unknown classification sample refuse declare:
To GMM models irritated, happy and tranquil three kinds of emotion classifications, can obtain 3 GMM likelihood probability density values, respectively the matching degree of representative sample and three emotion classifications; It is big more that the high more expression sample of fuzzy entropy of the judgement set that the likelihood probability density value constitutes belongs to uncertain degree irritated, happy and tranquil three kinds of emotions, then refuses to declare when fuzzy entropy surpasses certain threshold value Th:
1 C &Sigma; j = 1 C arctan ( p ( x i | &lambda; j ) / 10 ) ( ln ( &pi; / 2 ) ) - ln arctan ( p ( x i | &lambda; j ) / 10 ) ) > Th - - - ( 4 ) .
CN2010102305114A 2010-07-19 2010-07-19 Judgment-deniable automatic speech emotion recognition method for fidget Pending CN101937678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102305114A CN101937678A (en) 2010-07-19 2010-07-19 Judgment-deniable automatic speech emotion recognition method for fidget

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102305114A CN101937678A (en) 2010-07-19 2010-07-19 Judgment-deniable automatic speech emotion recognition method for fidget

Publications (1)

Publication Number Publication Date
CN101937678A true CN101937678A (en) 2011-01-05

Family

ID=43390977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102305114A Pending CN101937678A (en) 2010-07-19 2010-07-19 Judgment-deniable automatic speech emotion recognition method for fidget

Country Status (1)

Country Link
CN (1) CN101937678A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN102737629A (en) * 2011-11-11 2012-10-17 东南大学 Embedded type speech emotion recognition method and device
CN102779510A (en) * 2012-07-19 2012-11-14 东南大学 Speech emotion recognition method based on feature space self-adaptive projection
CN103578480A (en) * 2012-07-24 2014-02-12 东南大学 Negative emotion detection voice emotion recognition method based on context amendment
CN104835508A (en) * 2015-04-01 2015-08-12 哈尔滨工业大学 Speech feature screening method used for mixed-speech emotion recognition
WO2015124006A1 (en) * 2014-02-19 2015-08-27 清华大学 Audio detection and classification method with customized function
CN104951455A (en) * 2014-03-26 2015-09-30 北大方正集团有限公司 Information classification method and system based on category hypotaxis degree
CN105609116A (en) * 2015-12-23 2016-05-25 东南大学 Speech emotional dimensions region automatic recognition method
CN105719664A (en) * 2016-01-14 2016-06-29 盐城工学院 Likelihood probability fuzzy entropy based voice emotion automatic identification method at tension state
CN106874939A (en) * 2017-01-18 2017-06-20 中国地质大学(武汉) The atmosphere recognition methods of the view-based access control model information under domestic environment and identifying system
CN106878677A (en) * 2017-01-23 2017-06-20 西安电子科技大学 Student classroom Grasping level assessment system and method based on multisensor
CN106910512A (en) * 2015-12-18 2017-06-30 株式会社理光 The analysis method of voice document, apparatus and system
CN108346436A (en) * 2017-08-22 2018-07-31 腾讯科技(深圳)有限公司 Speech emotional detection method, device, computer equipment and storage medium
CN108648767A (en) * 2018-04-08 2018-10-12 中国传媒大学 A kind of popular song emotion is comprehensive and sorting technique
CN110782877A (en) * 2019-11-19 2020-02-11 合肥工业大学 Speech identification method and system based on Fisher mixed feature and neural network
CN111145785A (en) * 2018-11-02 2020-05-12 广州灵派科技有限公司 Emotion recognition method and device based on voice
US20210110273A1 (en) * 2019-10-10 2021-04-15 Samsung Electronics Co., Ltd. Apparatus and method with model training
CN114756734A (en) * 2022-03-08 2022-07-15 上海暖禾脑科学技术有限公司 Music piece segmentation emotion marking system and method based on machine learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007072485A1 (en) * 2005-12-22 2007-06-28 Exaudios Technologies Ltd. System for indicating emotional attitudes through intonation analysis and methods thereof
JP3973434B2 (en) * 2002-01-31 2007-09-12 三洋電機株式会社 Information processing method, information processing system, information processing apparatus, computer program, and recording medium
CN101506874A (en) * 2006-09-13 2009-08-12 日本电信电话株式会社 Feeling detection method, feeling detection device, feeling detection program containing the method, and recording medium containing the program
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
CN101620852A (en) * 2008-07-01 2010-01-06 邹采荣 Speech-emotion recognition method based on improved quadratic discriminant
JP2010054568A (en) * 2008-08-26 2010-03-11 Oki Electric Ind Co Ltd Emotional identification device, method and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3973434B2 (en) * 2002-01-31 2007-09-12 三洋電機株式会社 Information processing method, information processing system, information processing apparatus, computer program, and recording medium
WO2007072485A1 (en) * 2005-12-22 2007-06-28 Exaudios Technologies Ltd. System for indicating emotional attitudes through intonation analysis and methods thereof
CN101506874A (en) * 2006-09-13 2009-08-12 日本电信电话株式会社 Feeling detection method, feeling detection device, feeling detection program containing the method, and recording medium containing the program
CN101620852A (en) * 2008-07-01 2010-01-06 邹采荣 Speech-emotion recognition method based on improved quadratic discriminant
JP2010054568A (en) * 2008-08-26 2010-03-11 Oki Electric Ind Co Ltd Emotional identification device, method and program
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《计算机应用研究》 20081130 白洁,蒋冬梅,谢磊,付中华,任翠红 基于NAQ的语音情感识别研究 3243-3245,3258 1 第25卷, 第11期 2 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737629A (en) * 2011-11-11 2012-10-17 东南大学 Embedded type speech emotion recognition method and device
CN102737629B (en) * 2011-11-11 2014-12-03 东南大学 Embedded type speech emotion recognition method and device
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN102623009B (en) * 2012-03-02 2013-11-20 安徽科大讯飞信息科技股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN102779510A (en) * 2012-07-19 2012-11-14 东南大学 Speech emotion recognition method based on feature space self-adaptive projection
CN103578480A (en) * 2012-07-24 2014-02-12 东南大学 Negative emotion detection voice emotion recognition method based on context amendment
CN103578480B (en) * 2012-07-24 2016-04-27 东南大学 The speech-emotion recognition method based on context correction during negative emotions detects
WO2015124006A1 (en) * 2014-02-19 2015-08-27 清华大学 Audio detection and classification method with customized function
CN104951455A (en) * 2014-03-26 2015-09-30 北大方正集团有限公司 Information classification method and system based on category hypotaxis degree
CN104951455B (en) * 2014-03-26 2018-05-25 北大方正集团有限公司 A kind of information classification approach and system based on classification hypotaxis degree
CN104835508A (en) * 2015-04-01 2015-08-12 哈尔滨工业大学 Speech feature screening method used for mixed-speech emotion recognition
CN104835508B (en) * 2015-04-01 2018-10-02 哈尔滨工业大学 A kind of phonetic feature screening technique for mixing voice emotion recognition
CN106910512A (en) * 2015-12-18 2017-06-30 株式会社理光 The analysis method of voice document, apparatus and system
CN105609116A (en) * 2015-12-23 2016-05-25 东南大学 Speech emotional dimensions region automatic recognition method
CN105609116B (en) * 2015-12-23 2019-03-05 东南大学 A kind of automatic identifying method in speech emotional dimension region
CN105719664A (en) * 2016-01-14 2016-06-29 盐城工学院 Likelihood probability fuzzy entropy based voice emotion automatic identification method at tension state
CN106874939B (en) * 2017-01-18 2020-05-19 中国地质大学(武汉) Atmosphere field recognition method and system based on visual information in home environment
CN106874939A (en) * 2017-01-18 2017-06-20 中国地质大学(武汉) The atmosphere recognition methods of the view-based access control model information under domestic environment and identifying system
CN106878677B (en) * 2017-01-23 2020-01-07 西安电子科技大学 Student classroom mastery degree evaluation system and method based on multiple sensors
CN106878677A (en) * 2017-01-23 2017-06-20 西安电子科技大学 Student classroom Grasping level assessment system and method based on multisensor
CN108346436A (en) * 2017-08-22 2018-07-31 腾讯科技(深圳)有限公司 Speech emotional detection method, device, computer equipment and storage medium
US11922969B2 (en) 2017-08-22 2024-03-05 Tencent Technology (Shenzhen) Company Limited Speech emotion detection method and apparatus, computer device, and storage medium
US11189302B2 (en) 2017-08-22 2021-11-30 Tencent Technology (Shenzhen) Company Limited Speech emotion detection method and apparatus, computer device, and storage medium
CN108648767B (en) * 2018-04-08 2021-11-05 中国传媒大学 Popular song emotion synthesis and classification method
CN108648767A (en) * 2018-04-08 2018-10-12 中国传媒大学 A kind of popular song emotion is comprehensive and sorting technique
CN111145785A (en) * 2018-11-02 2020-05-12 广州灵派科技有限公司 Emotion recognition method and device based on voice
US20210110273A1 (en) * 2019-10-10 2021-04-15 Samsung Electronics Co., Ltd. Apparatus and method with model training
CN110782877A (en) * 2019-11-19 2020-02-11 合肥工业大学 Speech identification method and system based on Fisher mixed feature and neural network
CN114756734A (en) * 2022-03-08 2022-07-15 上海暖禾脑科学技术有限公司 Music piece segmentation emotion marking system and method based on machine learning
CN114756734B (en) * 2022-03-08 2023-08-22 上海暖禾脑科学技术有限公司 Music piece subsection emotion marking system and method based on machine learning

Similar Documents

Publication Publication Date Title
CN101937678A (en) Judgment-deniable automatic speech emotion recognition method for fidget
CN105719664A (en) Likelihood probability fuzzy entropy based voice emotion automatic identification method at tension state
Luo et al. Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.
Rodd et al. Modelling the effects of semantic ambiguity in word recognition
CN101261832B (en) Extraction and modeling method for Chinese speech sensibility information
CN109767785A (en) Ambient noise method for identifying and classifying based on convolutional neural networks
Shaw et al. Emotion recognition and classification in speech using artificial neural networks
Ghai et al. Emotion recognition on speech signals using machine learning
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
Zhang et al. Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.
CN108831450A (en) A kind of virtual robot man-machine interaction method based on user emotion identification
Alashban et al. Speaker gender classification in mono-language and cross-language using BLSTM network
Xiao et al. Recognition of emotions in speech by a hierarchical approach
Mavaddati Voice-based age, gender, and language recognition based on ResNet deep model and transfer learning in spectro-temporal domain
CN110866087A (en) Entity-oriented text emotion analysis method based on topic model
Bhattacharya et al. Deep analysis for speech emotion recognization
Trabelsi et al. Evaluation of influence of arousal-valence primitives on speech emotion recognition.
CN102915315A (en) Method and system for classifying webpages
Trabelsi et al. Improved frame level features and SVM supervectors approach for the recogniton of emotional states from speech: Application to categorical and dimensional states
Virkar et al. Proposed model of speech recognition using MFCC and DNN
Guan TSIA team at FakeDeS 2021: Fake News Detection in Spanish Using Multi-Model Ensemble Learning.
Aragón et al. INAOE-CIMAT at eRisk 2020: Detecting Signs of Self-Harm using Sub-Emotions and Words.
Ataollahi et al. Laughter classification using 3D convolutional neural networks
Maity et al. Attention Based BERT-FastText Model for Hate Speech and Offensive Content Identification in English and Hindi Languages.
Lyu et al. Automatic selection of lexical features for detecting Alzheimer's disease using bag-of-words model and genetic algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20110105

C20 Patent right or utility model deemed to be abandoned or is abandoned