CN101937678A

CN101937678A - Judgment-deniable automatic speech emotion recognition method for fidget

Info

Publication number: CN101937678A
Application number: CN2010102305114A
Authority: CN
Inventors: 赵力; 黄程韦; 邹采荣; 余华; 王开
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2010-07-19
Filing date: 2010-07-19
Publication date: 2011-01-05

Abstract

The invention discloses a judgment-deniable automatic speech emotion recognition method for fidget, which comprises the following steps of: (1) establishing a fidget speech database; (2) extracting speech emotion characteristics; (3) performing characteristic selection, namely, performing characteristic estimation by adopting fisher criterion; and (4), performing the judgment-deniable recognition method, wherein the judgment-deniable recognition method specifically comprises the following steps of: (4-1) modeling the three emotions of fidget, joy and calmness by adopting GMM in a way that each motion corresponds to a GMM model, and performing judgment by using maximum a posteriori; and (4-2) measuring a matching degree between a sample and a corresponding emotion type by adopting a likelihood ratio fuzzy entropy-based judgment denial method, thereby realizing the judgment denial of the sample of an unknown type. The method has the advantage of relatively higher fidget recognition performance.

Description

A kind of at irritated mood can be according to the automatic speech emotion identification method of declaring

Technical field

The present invention relates to a kind of audio recognition method, particularly a kind of at irritated mood can be according to the automatic speech emotion identification method of declaring.

Background technology

In artificial intelligence, emotion is calculated and is considered to give computing machine a critical path higher, comprehensive intelligence.In man-machine interaction, give the emotion ability of computing machine personification, make its energy perception environment and atmosphere on every side, self-adaptation provides the interactive environment of the most comfortable, eliminates the obstacle between people and the machine as far as possible, has become the target of next generation computer development.The method of speech emotional recognition technology application mode identification extracts speaker's affective state information from voice signal, thereby make computing machine recognizing voice emotion automatically, being the pith that emotion is calculated, is an important foundation of nature man-machine interaction.

Before the present invention, the research of speech emotional identification mainly concentrates on several emotions that basic emotion class another matter middle finger goes out, and comprises happiness, indignation, surprised, sadness and fear etc., but lacks research for the speech emotional that agitation etc. acquires a special sense.Present speech-emotion recognition method can not be discerned preferably to irritated mood.Identification to irritated emotional state has very high using value, particularly in Military Application fields such as Aero-Space, for a long time, uninteresting, high-intensity task can make the related personnel face harsh physiology and psychology test, cause agitation and wait some negative moods.After irritated mood occurs, if imappropriate processing can cause great influence to personnel's ability to work, even cause that artificial carelessness causes accident.Negative emotions such as discussion agitation are studied the method that improves individual cognition and work efficiency, the factor of avoiding influencing cognition and ability to work for the mechanism of action and the influence factor of human cognitive activity, have great practical significance.

In the speech emotional Study of recognition, be faced with the problem of emotion language material validity at present.Emotion language material data by the mode of performance is gathered are called the performance language material.Present most speech emotional Study of recognition is based on the performance language material.The advantage of performance language material is easy collection, and shortcoming is emotion performance exaggeration, with the practical natural voice certain difference is arranged, and the reliability that therefore causes performing data is relatively poor.Set up the emotion recognition system based on performance emotion language material,, caused the decline of recognition performance under physical condition because the data that are used for the model of cognition training have certain difference with actual data.Emotion language material by the method collection brought out is called and brings out language material.The characteristics of bringing out language material are that naturalness is higher, and are convenient to the language material that psychologic by experiment method control obtains needed particular emotion.Before the present invention, in the Chinese speech emotion recognition, also there is not the corpus that brings out of irritated mood.

Human emotion has ambiguity and diversity, in speech emotional identification, traditional recognition methods be with the sample that occurs rigid be divided into a certain class in the known class, yet when in reality, having more ambiguous emotion sample, the confidence level of classification is just relatively poor, and the probability of erroneous judgement is just higher.

Summary of the invention

The object of the invention is to fill up speech emotional recognition technology blank part in actual applications, provide a kind of at irritated mood can be according to the automatic speech emotion identification method of declaring.

The present invention adopts following technical scheme for achieving the above object:

A kind of can comprise the steps: of the present invention at irritated mood according to the automatic speech emotion identification method of declaring

(1) sets up irritated speech database;

(2) extract the speech emotional feature;

(3) feature selecting

(3-1) adopt the fisher criterion to carry out characteristic evaluating: the fisher criterion is represented with formula (1):

f (d) = \frac{1}{C_{m}^{2}} \underset{0 < i < j < m}{Σ} \frac{{(μ_{id} - μ_{jd})}^{2}}{σ_{id}^{2} + σ_{jd}^{2}} - - - (1)

Wherein μ is the average of eigenwert, and σ is the standard deviation of eigenwert, and m is the sum of classification, and d is a dimension.

(4) can be according to the recognition methods of declaring

(4-1) adopt GMM that irritated, happy and tranquil three kinds of emotions are carried out modeling, every kind of corresponding GMM model of emotion is adjudicated by maximum posteriori criterion: x _iRepresent i bar statement sample, λ _jRepresent j emotion classification, maximum a posteriori probability is expressed as:

p (λ_{j} | x_{i}) = \frac{p (x_{i} | λ_{j}) P (λ_{j})}{P (x_{i})} - - - (2)

Sample judgement to be identified is:

j^{*} = \arg \max_{j} p (x_{i} | λ_{j}) - - - (3)

Wherein, j ^*Classification under the expression sample;

(4-2) adopt a kind of method of declaring of refusing that the matching degree between sample and the emotion classification is measured based on the likelihood probability fuzzy entropy, thus realization to unknown classification sample refuse declare:

To GMM models irritated, happy and tranquil three kinds of emotion classifications, can obtain 3 GMM likelihood probability density values, respectively the matching degree of representative sample and three emotion classifications; It is big more that the high more expression sample of fuzzy entropy of the judgement set that the likelihood probability density value constitutes belongs to uncertain degree irritated, happy and tranquil three kinds of emotions, then refuses to declare when fuzzy entropy surpasses certain threshold value Th:

\frac{1}{C} Σ_{j = 1}^{C} \arctan (p (x_{i} | λ_{j}) / 10) (\ln (π / 2) - \ln \arctan (p (x_{i} | λ_{j}) / 10)) > Th - - - (4) .

Advantage of the present invention and effect are:

(1) based on the irritated mood in the automatic recognizing voice of voice signal.

(2) gather the language material of irritated mood by the method for bringing out, make data, thereby obtain the recognition performance of irritated mood preferably more near the real emotion data.

(3) adopt the speech-emotion recognition method that can refuse to declare, for uncertain or unknown emotion sample, sorter provides the recognition result that refusal is judged, does not promptly belong to any class in the practical speech emotional classification of needs detection.

Other advantages of the present invention and effect will continue to describe below.

Description of drawings

The two-dimentional dimensional space illustraton of model of Fig. 1---emotion.

Fig. 2---mapping function figure.

Fig. 3---the sample distribution figure in the prosodic features space.

Fig. 4---the sample distribution figure in the tonequality feature space.

Sample distribution figure in Fig. 5---the rhythm and the tonequality feature space.

Fig. 6---the average figure of preceding 5 features.

Fig. 7---the variogram of preceding 5 features.

Embodiment

Below in conjunction with drawings and Examples, technical solutions according to the invention are further elaborated.

1. the foundation of irritated speech database

In experimental psychology, carry out the stimulation of visually-perceptible and sense of hearing perception aspect by Computer Multimedia Technology, occur along with development of computer in recent years, adopt more a kind of laboratory facilities.Computer game can provide a man-machine interaction environment interaction, that have strong appeal by vision, the acoustic stimuli of picture and music, can effectively induce tested personnel's front and negative emotion.Particularly when recreation was won in succession, tested personnel were because the success in the game virtual scene with satisfied, is induced happy emotion; When recreation was failed continuously, tested personnel suffered setbacks in virtual scene, caused easily to comprise irritated negative emotion.In the experimentation that carries out the long period, the game operation and the failure of repeatability can be brought out irritated emotion smoothly.

(1) tested selection

Select ten university students (five male sex, five women) to carry out the voice collecting that computer game is brought out, before recreation, carry out recording of tranquil voice.

(2) design of statement text

A main application fields considering practical speech emotional identification such as agitation is the assessment of the negative emotions that caused of long-term Aeronautics and Astronautics and navigation task, " standard marine communication term " that 20 tendentious working words short sentences of ameleia are selected from International Maritime Organization (IMO) (IMO) issue (SMCP).

(3) You Xi selection

For the ease of bringing out irritated emotion, we have selected for use needs patient and careful computing machine trivial games.Tested personnel require to move a bead with mouse in the recreation, bead need be by that twine, narrow pipeline, bead can not be run into tube wall in by the process of pipeline, otherwise " bomb " will explode, failure game, smoothly by behind the pipeline, " bomb disposal " success is played and is won at the appointed time.

(4) record happy emotion language material

After each recreation triumph, require tested personnel to say the text sentence content that to record with happy emotion.

(5) record irritated emotion language material

Behind each failure game, require tested personnel to say the text sentence content that to record with irritated emotion.

(6) the subjective record of experiencing

Tested subjectivity of filling in mood is experienced, and comprises five selections: very irritated, a bit agitation, intermediateness, a bit happy, unusual happiness.

(7) screening of data

The mode of bringing out by computer game obtains original emotional speeches irritated, happy and tranquil three kinds of emotions and amounts to 1800, listens tin to distinguish and to filter out 1321 of the higher statements of quality, as the final speech emotional data of bringing out.

2. speech emotional feature extraction

The dimensional space opinion of emotion is thought, human all moods all are made up of several dimensional space, specific emotional state can only represent one from get close to shrink back or continuous space from happy to misery the position, be not independently between the different moods, but it is continuous, can realize gradually, change stably, similarity between the therefore different moods and otherness can be represented according to the distance in dimensional space each other.Nearest two during the last ten years, the dimension of emotion has been subjected to the attention of Many researchers, and the most well accepted emotion dimension model is the joyful degree-activity model among Fig. 1.The implication of joyful degree and two dimensional space of activity is as follows: joyful degree (Hedonic tone) or title degree of tiring (Valence), its theoretical foundation is that the separation of positive negative-morality activates, mainly being presented as the subjective mood impression of emotion main body, is a kind of tolerance to mood and subjective relationship; Activity (Activation) or claim degree of waking up (Arousal), the degree that the human body energy that refers to interrelate with affective state activates is a kind of tolerance to the inherent energy of mood.Residing position is as shown in Figure 2 in dimensional space for the agitation of discerning among the present invention, happiness and tranquil three kinds of affective states." agitation " is meant the psychological condition of tested unhappiness, irritability; " happiness " refers to tested happiness, happy, positive, positive psychological condition.Irritated and the happy two ends that are positioned at joyful degree, agitation is presented as the negative emotion of emotion main body, happiness is presented as the positive emotion of emotion main body.On activity, irritated and happiness all is positioned at forward space zone, and the energy activated degree of body was all higher when these two kinds of emotions occurred.

The quality of affective characteristics has very important influence to the quality of the final recognition effect of emotion, how to extract and select to reflect the phonetic feature of emotion variation, is one of present speech emotional identification field sixty-four dollar question.Generally the most basic affective characteristics of Cai Yonging is a prosodic features, as fundamental frequency, short-time energy, pronunciation duration, word speed etc.These prosodic features can reflect speaker's partial feeling information really, go up largely and can distinguish different basic emotion classifications, and the algorithm of parameter extraction is comparatively ripe.Yet three-dimensional model: " excitation dimension (Arousal)-dimension of tiring (Valence)-control dimension (Power) " according to emotion, traditional prosodic features has only reflected the information of " excitation dimension " in the emotion three-dimensional model, thereby only uses prosodic features can not distinguish all emotion classifications well.Tonequality feature in the voice signal, not only in close relations with " dimension of tiring " of emotion, and can partly reflect the information of " control is tieed up " in the three-dimensional dimension model.Therefore, in order to discern irritated emotional speech better, the extraction of affective characteristics not only comprises the prosodic features parameter among the present invention, and comprises the tonequality characteristic parameter of voice.Used 74 global statistics features in the present invention, wherein preceding 36 are characterized as prosodic features, and back 38 are characterized as the tonequality feature.

(1) structure of rhythm affective characteristics

Feature 1-10: the average of short-time energy and difference thereof, maximal value, minimum value, intermediate value, variance;

Feature 11-25: the average of fundamental tone and single order thereof, second order difference, maximal value, minimum value, intermediate value, variance; Feature 26: fundamental tone scope;

Feature 27-36: the ratio of ratio, pronunciation number of regions and the overall area number of the ratio of ratio, pronunciation frame number and the totalframes of pronunciation frame number, mute frame number, mute frame number and pronunciation frame number, pronunciation number of regions, mute number of regions, pronunciation number of regions and mute number of regions, long hair sound number of regions, the longest mute number of regions;

(2) structure of tonequality affective characteristics

Feature 37-66: the average of first, second, third resonance peak and first order difference thereof, maximal value, minimum value, intermediate value, variance;

The following spectrum energy number percent of feature 67-69:250Hz, the following spectrum energy number percent of 650Hz, the above spectrum energy number percent of 4kHz.

Feature 70-74: harmonic noise is than average, maximal value, minimum value, intermediate value, the variance of (HNR).

3. feature selecting

The selection of affective characteristics is one of the most valued problem in the speech emotional identification always, and to the evaluation of the quality of a feature, we consider two aspects: the numerical value of feature, and the variance of feature.Take all factors into consideration the factor of these two aspects, adopt the fisher criterion to carry out characteristic evaluating.To d dimension, the fisher criterion can be represented with formula (1):

f (d) = \frac{{(μ_{1 d} - μ_{2 d})}^{2}}{σ_{1 d}^{2} + σ_{2 d}^{2}} - - - (1)

μ wherein _1d, μ _2dBe the average of the eigenwert of two classifications of d dimension,

It is the variance of the eigenwert of two classifications of d dimension.The fisher criterion is big more, shows that this feature is good more to distinguishing this two kinds effect.For the situation of multiclass, formula (1) can be rewritten as:

f (d) = \frac{1}{C_{m}^{2}} \underset{0 < i < j < m}{Σ} \frac{{(μ_{id} - μ_{jd})}^{2}}{σ_{id}^{2} + σ_{jd}^{2}} - - - (2)

Wherein m is the sum of classification.According to the fisher criterion, preceding ten best features that irritated, happy, tranquil three kinds of emotions are selected are:

(1) number percent of the following spectrum energy of 250Hz

(2) average of fundamental tone first order difference

(3) average of fundamental tone

The intermediate value of (4) first resonance peaks

The minimum value of (5) first resonance peaks

(6) variance of short-time energy

The average of (7) second resonance peaks

(8) ratio of pronunciation frame number and totalframes

(9) harmonic noise compares average

(10) number percent of the following spectrum energy of 650Hz

4. can be according to the recognition methods of declaring

The distribution of emotion sample in feature space can be described with the stack of a plurality of Gaussian functions.In theory, as long as mix enough gaussian component, gauss hybrid models (GMM) can match probability density function arbitrarily.Adopt GMM that irritated, happy and tranquil three kinds of emotions are carried out modeling in the present invention, every kind of corresponding GMM model of emotion is adjudicated by maximum posteriori criterion.x _iRepresent i bar statement sample, λ _jExpression emotion classification, maximum a posteriori probability can be expressed as:

p (λ_{j} | x_{i}) = \frac{p (x_{i} | λ_{j}) P (λ_{j})}{P (x_{i})} - - - (3)

P (x wherein _i| λ _j) obtain by the GMM model of each emotion.For given statement sample, the probability that eigenvector occurs is a constant, supposes every kind of equiprobable appearance of emotion, and C is an emotion classification number.

P (λ_{j}) = \frac{1}{C}, 1 \leq i \leq C - - - (4)

So, sample to be identified can be adjudicated and is,

j^{*} = \arg \max_{j} p (x_{i} | λ_{j}) - - - (5)

Wherein, j ^*Classification under the expression sample.

At the fuzzy and uncertainty of emotion under the actual environment, the diversity of practical speech emotional kind is necessary to study the practical speech emotional method of the identification that can refuse to declare.Below we will adopt a kind of based on the likelihood probability fuzzy entropy refuse to declare method, adopt fuzzy entropy to come the matching degree between sample and the emotion classification is measured, thus realization to unknown classification sample refuse declare.When sample to be identified arrives,, obtain C GMM likelihood probability density value, be mapped to the degree of membership μ that belongs to j emotion classification between 0 to 1 as i sample with GMM likelihood probability density value respectively by the GMM model of C kind emotion _j(x _i):

μ_{j} (x_{i}) = \frac{\arctan (p (x_{i} | λ_{j}) / 10)}{π / 2} - - - (6)

Wherein the projection function of Cai Yonging is,

y = \frac{\arctan (x / 10)}{π / 2} - - - (7)

Functional digraph as shown in Figure 2

Fuzzy set E for all possible composition of sample of j emotion classification _j={ x ₁, x ₂..., x _n, its degree of membership is respectively μ _j(x ₁), μ _j(x ₂) ..., μ _j(x _n), make that its fuzzy entropy is e (μ _j(x _i)), e (μ then _j(x _i)) should satisfy:

1) e (μ _j(x _i)) with μ _j(x _i) increase and reduce;

2) work as μ _j(x _i) be 1 o'clock, e (μ _j(x _i)) be 0;

3) two independently the entropy of fuzzy set should satisfy additive property.

Here additive property is the condition of a strictness, has only the additive property of satisfying, and the entropy of fuzzy set could uniquely be determined.For the fuzzy set E of two emotion classifications independently _j, E _k, it amasss and is:

E _jE _k：μ _jk(x _i)＝μ _j(x _i)μ _k(x _i) (8)

Set E _jE _kEntropy be defined as:

e(μ _jk(x _i))＝e(μ _j(x _i))+e(μ _k(x _i)) (9)

Be similar to the proof of entropy at random, the expression formula that can be met the fuzzy entropy of top three conditions is:

e(μ _j(x _i))＝-Klnμ _j(x _i) (10)

Wherein K is the number greater than 0.Formula (6) substitution is got, and the fuzzy entropy that i sample belongs to j emotion classification is,

e(μ _j(x _i))＝-K(ln?arctan(p(x _i|λ _j)/10)-ln(π/2)) (11)

The average blur entropy that the judgement that C likelihood probability value of i sample to be identified constituted is gathered is evaluated as,

S (x_{i}) = \frac{1}{C} Σ_{j = 1}^{C} μ_{j} (x_{i}) e (μ_{j} (x_{i})) - - - (12)

Formula (11) substitution is had,

S (x_{i}) = - \frac{2 K}{πC} Σ_{j = 1}^{C} \arctan (p (x_{i} | λ_{j}) / 10) (\ln \arctan (p (x_{i} | λ_{j}) / 10) - \ln (π / 2)) - - - (13)

To GMM models irritated, happy and tranquil three kinds of emotion classifications, can obtain 3 GMM likelihood probability density values, respectively the matching degree of representative sample and three emotion classifications.It is big more that the high more expression sample of fuzzy entropy of the judgement set that the likelihood probability density value constitutes belongs to uncertain degree irritated, happy and tranquil three kinds of emotions, then refuses to declare when fuzzy entropy surpasses certain threshold value Th, and constant K is got pi/2.

S(x _i)＞Th (14)

With formula (13) substitution promptly,

\frac{1}{C} Σ_{j = 1}^{C} \arctan (p (x_{i} | λ_{j}) / 10) (\ln (π / 2) - \ln \arctan (p (x_{i} | λ_{j}) / 10)) > Th - - - (15)

Wherein Th is fuzzy entropy threshold value definite in the experiment.Choosing of threshold value should guarantee that target emotion classification to be identified obtains correct identification, takes into account unknown uncertain emotion sample again and obtains refusing to declare.Fuzzy threshold setting low excessively then refuses to declare DeGrain to uncertain sample.Blur the too high of threshold setting, that then refuses to declare is too much, can make system's average recognition rate reduce.Need refuse to declare when known emotion model distance is far away when the part sample, refuse simultaneously to declare and also can make some test sample book correctly not discerned.So should guarantee that three classifications such as agitation, happiness, calmness can obtain to regulate the fuzzy entropy threshold value under the prerequisite of satisfied discrimination.When average recognition rate took place significantly to descend, the threshold value of this moment was the upper limit.The fuzzy entropy threshold value is made as 0.1 in the experiment.

5. system performance analysis

(1) sample distribution of feature space

Prosodic features correlativity main and activity is bigger, and the correlativity of tonequality feature and joyful degree is bigger.In 74 acoustic features that extract among the present invention, preceding 36 are characterized as prosodic features, and back 38 are characterized as the tonequality feature.We carry out the KL conversion respectively with prosodic features and tonequality feature, and the feature space analysis on joyful degree and the activity is carried out in the distribution of use PCA method.Preceding two dimensions that intercept PCA after the conversion respectively constitute two-dimensional feature space, shown in agitation, calmness and the distribution as shown in Figure 3 and Figure 4 of happy three class samples in the two-dimensional space that constitutes, the two-dimentional PCA space that Fig. 3 constitutes for prosodic features, the two-dimentional PCA space that Fig. 4 constitutes for the tonequality feature.

We can see that when only using prosodic features, tranquil and all the other two kinds of emotions can make a distinction preferably, and this is because agitation and happy all higher on activity is bigger with the distance of tranquil emotion.Yet irritated and happy two kinds of emotions, difference is less relatively on activity, and when only using prosodic features, the sample distribution region overlapping of two kinds of emotions is more in the two-dimensional feature space in Fig. 3.The theory that this is main with prosodic features and the activation dimension is corresponding is consistent.Therefore when irritated, the happy and tranquil three kinds of emotions of identification, only use traditional prosodic features can not well reach the effect of classification, need to extract with joyful and spend corresponding acoustic feature.

On joyful degree, irritated and happy have bigger distance, lays respectively at the positive and negative two ends of joyful degree; Opposite tranquility is between agitation and happiness, and distance is less on each comfortable joyful degree with them.We can see in Fig. 4, after using the tonequality feature, distribution distance in two-dimensional feature space between the irritated and happy sample is bigger, can obtain the property distinguished preferably, though the sample of tranquil emotion and bigger obscuring arranged between them, the big emotion of distance is effectively on irritated and happy two kinds of joyful degree but the use of tonequality feature is to distinguishing, and illustrates that the correlativity of tonequality feature and joyful degree is bigger.

We notice in the feature space of two dimension before PCA simultaneously, the isolated sample of minority is arranged, the appearance of inferring isolated sample is because the complicacy of the expression pattern of actual emotion causes, and illustrates that practical speech emotional special expression pattern may occur under different environment.These isolated samples are very little to the contribution of training emotion model in the present invention, therefore can be subjected to certain restriction to being modeled on the generalization of actual emotion.

Comprehensive 74 prosodic features and the tonequality feature used, the feature space that constitutes by preceding two dimensions of PCA method as shown in Figure 5, we can see that the sample distribution of agitation, happiness and tranquil three kinds of emotions has obtained distinguishing preferably, therefore adopts prosodic features and tonequality feature to distinguish preferably practical speech emotionals such as agitations in the dimensional space of emotion.

(2) average of best features and variance

Average and variance such as Fig. 6 and shown in Figure 7 after irritated, happy, tranquil preceding 5 best features normalization.Can see that from statistics feature selected among the present invention can distinguish agitation, happiness and tranquil three kinds of affective states preferably.The following spectrum energy number percent of 250Hz has embodied the energy of low frequency region, voice under tranquility value on this feature is higher, irritated, happy value is lower, the value of wherein irritated emotion correspondence is minimum, illustrate that the following spectrum energy of the 250Hz in the voice signal can decrease when irritated emotion occurring.The first order difference of fundamental tone has embodied the speed that fundamental frequency changes, and irritated with happy fundamental tone changes greatly, and tranquility is less relatively.The fundamental tone characteristics of mean can be distinguished three class emotions equally preferably, and on this feature, irritated and happy value is bigger, and the value during tranquility is less.On the intermediate value and minimum value feature of resonance peak, three kinds of emotions have obtained preferably equally to be distinguished, and the value of tranquility is less relatively.

(3) recognition result

Carry out testing with the emotion recognition of the irrelevant text-independent of speaker, every kind of emotion is randomly drawed 400, is divided into two groups, one group of 300 galley proof originally carries out the training of GMM emotion model, and three kinds of emotions amount to 900, another organizes 100 samples, is used to test discrimination, and three kinds of emotions amount to 300.In bringing out the raw tone of sound bank, distinguish that by listening the disallowable emotion statement of experiment has 479, these statements are considered to the lower data of emotion degree of membership, choose minimum 100 of degree of membership wherein, as uncertain unknown emotion classification sample, be used to refuse to declare test.Adopt preceding ten characteristic dimension of PCA method and preceding ten best features of best features group selection method respectively, the practical speech-emotion recognition method that use can be refused to declare, discriminations irritated, happy and tranquil three kinds of emotions are tested, and the test recognition result as shown in Table 1 and Table 2.

Table 1PCA method recognition result

Table 2 best features group recognition result

We can see among Fig. 1 and Fig. 2, and the uncertain sample that about 60 percent emotional expression is fuzzy is refused to declare, and the rejecting of the sample that this a part of sorter can't be judged has comparatively effectively reduced the generation of judging by accident.In experimental result, the discrimination of PCA method and two kinds of methods of best features group selection is suitable substantially, and the average recognition rate of PCA method is 77.0%, and the average recognition rate of best features group selection method is 75.7%.

The scope that the present invention asks for protection is not limited only to the description of this embodiment.

Claims

1. can comprise the steps: at irritated mood according to the automatic speech emotion identification method of declaring