CN104077598B

CN104077598B - A kind of emotion identification method based on voice fuzzy cluster

Info

Publication number: CN104077598B
Application number: CN201410299493.3A
Authority: CN
Inventors: 周代英; 谭发曾; 贾继超; 田兵兵; 谭敏洁
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2014-06-27
Filing date: 2014-06-27
Publication date: 2017-05-31
Anticipated expiration: 2034-06-27
Also published as: CN104077598A

Abstract

The present invention relates to speech emotion recognition technology, a kind of emotion identification method based on voice fuzzy cluster is particularly related to.The method of the present invention includes：Voice signal to being input into is pre-processed；The characteristic information of the voice signal after extraction process；Multiclass emotion is grouped, and corresponding characteristic information is chosen according to the type after the packet of multiclass emotion respectively；Classification treatment is carried out according to the characteristic information that each group of emotion class combination is chosen respectively；Speech emotion recognition is carried out according to the output result after each group of emotion class assembled classification；Beneficial effects of the present invention are to choose different features by different emotions, and the recognition effect with the FCM methods of the improved adaptive fuzzy K mean cluster method same feature of emotions more all than traditional approach is well a lot, and discrimination is higher, and effect is more preferable.Present invention is particularly suitable for speech-sound intelligent emotion recognition.

Description

A kind of emotion identification method based on voice fuzzy cluster

Technical field

The present invention relates to speech emotion recognition technology, particularly relate to a kind of emotion based on voice fuzzy cluster and know Other method.

Background technology

With the development of artificial intelligence, emotion intelligently generates affection computation combined with computer technology, and this brand-new grinds Study carefully problem.Language is the important instrument of Human communication, and the mankind not only contain text symbol information in speaking, but also comprising Emotion information.To speech emotional information processing, have great importance in signal transacting and artificial intelligence field.In voice In emotion recognition field, many experts and scholars have done substantial amounts of research work, including set up a speech emotional storehouse for standard, language Sound feature extraction, classifying identification method research.In speech emotional feature selecting, forefathers have also been made many researchs, but not refer to Go out to recognize which the specific features of specific emotion have.Because speech emotional has ambiguity in itself, so experts scholar Try to be carried out being used in speech emotion recognition, but their research with the method for fuzzy clustering same feature to multiclass feelings Sense is identified, and recognition effect is unsatisfactory.Many clustering algorithms determine cluster, base based on Euclidean or mahalanobis distance measurement It is intended to find the spherical cluster with similar dimension and density in the algorithm of such distance metric.But, an emotion cluster can Can be arbitrary shape, therefore the clustering algorithm for using at present can not well recognize voice class.

The content of the invention

It is to be solved by this invention, aiming at the above mentioned problem that conventional art is present, propose a kind of based on voice fuzzy The emotion identification method of cluster.

The present invention solves the technical scheme that is used of above-mentioned technical problem：A kind of emotion based on voice fuzzy cluster is known Other method, it is characterised in that comprise the following steps：

A. the voice signal being input into is pre-processed；The pretreatment includes preemphasis filtering and adding window framing, by language Message number is divided into N frames, and wherein N is the positive integer more than 1；

B. the characteristic information of the voice signal after extraction process；The characteristic information includes mel cepstrum coefficients, fundamental tone, is total to Shake peak and short-time energy；

C. multiple graders are input into after being combined voice signal and characteristic information carries out classification treatment；The grader The emotional category included including at least 2 kinds of emotional categories and each grader is incomplete same；The voice signal is believed with feature The concrete mode that breath is combined is that, according to the emotional category that the grader that will be input into is included, voice signal is chosen different Characteristic signal constitutive characteristic information vector X, the wherein row vector of X is the characteristic information that each frame voice signal is chosen, its row Vector is frame number N；

D. classification treatment is carried out to each grader respectively, draws voice signal with emotional category in the classifiers Degree of membership；Specific sorting technique is using adaptive fuzzy K mean algorithms；

E. the degree of membership result for being exported according to each grader carries out speech emotion recognition；Specific recognition methods be by All output results constitute super vectors, enter after row decoding to export the recognition result of judgement to super vector.

Specifically, in the characteristic information extracted in step b, the fundamental tone includes fundamental tone variance, fundamental tone minimum value；It is described common Shake peak including first resonance peak maximum, the first formant minimum value, first resonance peak to average；Second resonance peak maximum, the Two resonance peak to average；3rd resonance peak maximum, the 3rd resonance peak to average, the 3rd formant variance；The short-time energy is short When energy-minimum；

Specifically, multiclass emotion described in step c is 4 classes, and it is respectively glad, angry, sad and tranquil, its specific point Prescription method is divided into six groups to be grouped two-by-two, first group for it is glad/angry, second group for it is glad/sad, the 3rd group for it is glad/ It is tranquil, the 4th group for it is angry/sad, the 5th group for it is angry/tranquil, the 6th group be sad/tranquil；Each group of extraction makes in the group The characteristic information number that two class emotions are optimal, then by each group of characteristic information composition characteristic information sequence collection X, wherein special The row vector of reference breath X is obtained from a frame voice signal, and the size of row is one section of frame number of voice；Wherein, each group of spy Reference cease specifically, first group of characteristic information of extraction be mel cepstrum coefficients, the first resonance peak maximum, the second formant most Big value, the 3rd resonance peak maximum, the 3rd resonance peak to average；Second group of characteristic information of extraction is mel cepstrum coefficients, first Formant minimum value, the 3rd resonance peak to average, fundamental tone minimum value, fundamental tone variance；The 3rd group of characteristic information of extraction falls for Mel Spectral coefficient, the first formant variance, the second resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value；4th group of extraction Characteristic information is mel cepstrum coefficients, first resonates peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy most Small value；The 5th group of characteristic information of extraction is mel cepstrum coefficients, the first resonance peak maximum, the first formant variance, second Resonance peak maximum, the 3rd formant variance；The 6th group of characteristic information of extraction is mel cepstrum coefficients, the first formant side Difference, the second resonance peak maximum, the 3rd resonance peak to average, short-time energy minimum value.

Specifically, used described in step d the specific method that adaptive fuzzy K mean algorithms are classified for：

The object function of adaptive fuzzy K mean algorithms is defined as：Wherein,ForX is characterized information sequence collection, and U is person in servitude Category degree matrix, V is cluster centre matrix, and A induces big matrix, N to be characterized information number, i.e. sample number for the norm of c classes, and c is Cluster species number, m is FUZZY WEIGHTED index, u_ikRepresent membership function value of k-th sample for the i-th emotion class, v_iFor certain The center of one emotion class, is a cluster centre vector, x_kIt is a certain characteristic information vector, A_iIt is the local model of a certain class Number induced matrix；To reach the purpose of classification, it is necessary to make object function J minimum, calculated by loop iteration, when Subject Matrix is steady When being exactly minimum object function when determining, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be random Choose；The loop iteration is calculated and comprised the following steps：

The first step：Calculate cluster centre

Second step：Calculate cluster covariance matrix

3rd step：Calculate mahalanobis distance, Wherein,||A_i| |=ρ_i, ρ ＞ 0, ρ_iIt is the local clustering parameter of control；

4th step：Update subordinated-degree matrix,

L is the iterations of circulation, and loop stop conditions are | | U^(l)-U^(l-1)||≤ε；Each group of feature is believed respectively Breath X is calculated by above-mentioned loop iteration and is carried out processing the stable Subject Matrix U for obtaining each group.

Specifically, the specific method of step e is：

E1. the confidence level w of sample in each group is sought according to the Subject Matrix U tried to achieve in step d_ij,

E2. the court verdict C of two classification samples in each group is defined_ij, C_ij=w_ij.I, I=+1, wherein -1, I=+1 tables Sample is originally judged as first classification in the classification of two classes, and I=-1 is represented and is judged as another classification, by 6 groups of emotional semantic classifications point Not Song Ru 6 graders make decisions output；

E3. calculated by associated translation, correlation computations formula is R^T=C^T.I_6×4, wherein C is 6 output knots of grader Fruit constitutes a column vector,It is four class emotions, six kinds of classification code word matrix of combination, wherein R= {r₁,r₂,…r_n}；

E4. judge recognition result, use i^*Represent the label of the emotional category that sample is identified, wherein i^*=argmax {r_i}。

Beneficial effects of the present invention are to choose different features by different emotions, with improved adaptive fuzzy K averages Clustering method emotions more all than traditional approach are well many with the recognition effect of the FCM methods of same feature, and discrimination Higher, effect is more preferable.

Brief description of the drawings

Fig. 1 is speech emotion recognition flow chart of the invention；

Fig. 2 is judgement identification process figure of the invention.

Specific embodiment

With reference to the accompanying drawings and examples, technical scheme is described in detail：

Embodiment：

This example selects glad, angry, sad, the tranquil four classes emotion to carry out voice based on Berlin speech emotional storehouse (Emo-DB) Emotion recognition；

As shown in figure 1, this example is comprised the following steps：

S1：Voice is pre-processed

Pretreatment includes preemphasis and adding window framing.

Preemphasis treatment：The purpose of preemphasis is that the frequency spectrum for making signal becomes flat, keeps from low to high whole In frequency band, frequency spectrum can be sought with same signal to noise ratio, be analyzed in order to spectrum analysis or channel parameters.Preemphasis is usually to use one Digital filter H (z) of rank=1- α z^-1, wherein α is pre emphasis factor, and α takes 0.9 in this example.Primary speech signal S is by pre- X (l) is obtained after aggravating filtering.

Framing：Voice is carried out into framing with the Hamming window that length is 23ms, N frames letter is obtained after one section of voice signal framing Number, each frame is considered as a sample.

It is changed into x after signal x (l) windowing process_nM (), formula is as follows：

x_n(m)=w (m) x (n+m) 0≤m≤N-1 (1)

Hamming window：

S2：Feature extraction

This example extracts the deformation of Short Time Speech feature and correlated characteristic with voicebox, and voicebox is based on MATLAB languages One speech processes tool box of speech.

The feature extracted includes mel cepstrum coefficients (MFCC), fundamental tone, formant, short-time energy.

Mel cepstrum coefficients (MFCC)：Mel cepstrum coefficients are what the auditory properties based on human ear were proposed, and it is using a kind of Nonlinear cps (Mel frequencies) simulate the auditory system of people.Experiment finds, in below 1000Hz, perception with Frequency is linear, and more than 1000Hz, perception then with frequency into logarithmic relationship.So having difference to different frequencies Perception, it is and especially sensitive to low frequency.Conversion formula between frequency f and MEL frequency is

In formula, f is frequency, unit：Hz.A frame voice signal takes 12 Jan Vermeer cepstrum coefficients in the present invention

Fundamental tone：Voice is that the different vibration of a series of frequencies for sending of sounding body, amplitude is composited.These shake There is the minimum vibration of a frequency in dynamic, the sound sent by it is exactly fundamental tone, and multiple fundamental tones can be tried to achieve in a frame voice signal, Thus can further obtain the fundamental tone minimum value and fundamental tone variance in a frame signal.

Formant：Sound when by resonant cavity, by the filter action of cavity so that the energy of different frequency in frequency domain Redistribute, because the resonant interaction of resonant cavity is strengthened, another part is then attenuated, those for being strengthened for a part Frequency shows as dense blackstreak on the sonagram of time frequency analysis.Because Energy distribution is uneven, strong part is just as mountain Peak is general, so referred to as formant, can choose first common in a frame voice signal in the hope of multiple formants, in the present invention Shake peak, the second formant, the 3rd formant, thus can further try to achieve the minimum value, most of formant in a frame signal Big value and variance.

Short-time energy：So-called short-time energy is exactly the energy sum of a frame signal.Formula is

S3：Feature selecting

Four class emotions are used combination of two by this example, be obtained six groups of emotions to (it is glad/angry, it is glad/sad, it is glad/ Calmness, it is angry/sad, it is angry/tranquil, sad/tranquil), different combinations of features are then chosen as the defeated of each grader Enter, specific features selection is as shown in table 1：

The best features group selection of table 1

As shown in figure 1, the different emotions of identification are to different characteristic informations, feature selecting 1 recognizes glad/angry emotion Right, feature selecting 2 recognizes glad/sad emotion pair, and feature selecting 3 recognizes glad/tranquil emotion pair, the identification life of feature selecting 4 Gas/sadness emotion pair, feature selecting 5 recognizes angry/tranquil emotion pair, and feature selecting 6 recognizes sad/tranquil emotion pair.

In feature selecting, the feature that feature selecting 1 is chosen has MFCC, the first resonance peak maximum, the second formant maximum Value, the 3rd resonance peak maximum, the 3rd resonance peak to average, the feature that feature selecting 2 is chosen have MFCC, the first formant minimum Value, the 3rd resonance peak to average, fundamental tone minimum value, fundamental tone variance, the feature that feature selecting 3 is chosen have MFCC, the first formant side Difference, the second resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value, the feature that feature selecting 4 is chosen have MFCC, first Resonance peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy minimum value, the feature that feature selecting 5 is chosen have MFCC, the first resonance peak maximum, the first formant variance, the second resonance peak maximum, the 3rd formant variance, feature selecting 6 features chosen have MFCC, the first formant variance, the second resonance peak maximum, the 3rd resonance peak to average, short-time energy minimum Value.

Glad/angry one group of feature group is chosen in the present invention to illustrate, one section of voice by treatment above just N frame signals, that is, N number of sample can be obtained, each sample will extract MFCC, the first resonance peak maximum, the second formant The features such as maximum, the 3rd resonance peak maximum, the 3rd resonance peak to average, these features can constitute one 5 dimension row vector, institute The matrix X of Nx5 is just obtained with one section of voice signal, other groups are by that analogy.

S4：Grader is selected

Speech emotional classification is classified with improved fuzzy K mean algorithms in the present invention.

In FCM (Fuzzy C-Means) algorithm, it is assumed that have a sample sequence collection X={ x₁,x₂,……,x_N, then its Object function is as follows,

Wherein, N represents number of samples；C represents cluster species number, and m is FUZZY WEIGHTED index, and he controls fog-level, It is a critically important parameter, x_kIt is some sample vector, v_iIt is the center of a certain class, is a cluster centre vector. u_ikRepresent membership function value of k-th sample for the i-th class, and a certain sample to the membership function value of each cluster centre Sum should be 1, i.e.,

U is subordinated-degree matrix, V={ v₁,v₂,...,v_c},v_i∈RⁿIt is the cluster centre vector for needing to be determined, A is c classes Norm induce big matrix.

Formula (5) be an inner product apart from norm square.To obtain the minimum of formula (3) under conditions of (4) formula, D is sought to v_iAnd u_ikPartially and equal to zero, can try to achieve

Initialization cluster centre v_i, cluster centre quantity c and Fuzzy Weighting Exponent m are set, then by formula (6) and formula (7) weight Multiple iteration reaches stabilization until being subordinate to angle value.FCM algorithms are, with the Euclidean distance criterion of standard, to refer to hypersphere and gather Class, therefore the spherical cluster of similar dimension and density can only be detected because selection identical standard norm induced matrix be A=I or Person A can be defined as the inverse of the covariance matrix of n × n：A=F^-1, wherein F is as follows：

The sample average of data is represented, herein, A employs mahalanobis distance.

The cluster of different geometries is concentrated in a data in order to detect, the present invention is using self adaptation apart from norm to mark Quasi-mode paste K mean algorithms are improved and obtain adaptive fuzzy K mean algorithms, and each cluster uses the induced matrix of oneself A_i, this will produce following inner product norm：

Matrix A_iAs optimized variable in K mean value functions, so that the self adaptation distance function of each class is to data set Local topology it is optimal.The object function of self adaptation FCM algorithms is defined as follows：

Object function can not be relative to A_iDirectly minimize, because A_iIt is linear, that is to say, that as long as A_iPositive fixed number get over Few object function is smaller.In order to obtain a feasible solution, it is necessary to A_iBe any limitation as, typically to A_iDeterminant is restricted, Allow matrix A_iDeterminant is fixed as certain value makes cluster shape optimal while keeping locality set size constant.||A_i| |=ρ_i, ρ ＞ 0, ρ_iAll it is fixed value for each class, then with method of Lagrange multipliers, solves A_iIt is as follows：

Wherein F_iIt is the fuzzy covariance matrix of the i-th class, is defined as follows：

Formula (10) will be brought into x will be obtained in formula (11), (12) above_kTo cluster average v_kGeneral Quadratic mahalanobis distance Norm, it can be seen that be to be weighting in this covariance with subordinated-degree matrix U.

It is in the present invention the purpose for reaching classification, it is necessary to make object functionMinimum, Wherein X is the characteristic information matrix that step S3 is obtained, and U is subordinated-degree matrix, and V is cluster centre matrix, and A is that the norm of 4 classes is lured Big matrix is led, N is characterized information number, i.e. sample number, and c=4 is cluster species number, and m=1 is FUZZY WEIGHTED index, u_ikRepresent K-th sample for the i-th emotion class membership function value, v_iIt is the center of a certain emotion class, is a cluster centre arrow Amount, x_kIt is the characteristic information vector of a certain sample, is calculated by loop iteration, is exactly target letter when Subject Matrix stabilization When number is minimum, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be randomly selected, | | A_i| |=ρ_i, ρ ＞ 0,.Specific circulation step is as follows：

The first step calculates cluster centre,

Second step calculates cluster covariance matrix,

3rd step calculates mahalanobis distance,

Wherein

4th step updates subordinated-degree matrix,

L is the iterations of circulation, and loop stop conditions are | | U^(l)-U^(l-1)||≤ε；Each group of feature is believed respectively Breath X is processed by above-mentioned execution step can just obtain each group of stable Subject Matrix U.Such 6 groups of characteristic informations are by 6 The fuzzy clustering treatment of two graders, exports 6 Subject Matrix of stabilization.

S5：Judgement identification

In the judgement of this paper multi-categorizers, the output confidence level first to each sub-classifier is evaluated, Ran Houtong Cross related operation to make decisions, calculate final recognition result.Analysis, two classes are identified by single sample in the present invention U in the subordinated-degree matrix U of self adaptation FCM outputs_ikAnd u_jkRepresent that k samples belong to the degree of i classes and j classes.When two degrees of membership Difference it is bigger, illustrate identification correctness it is bigger.W can be used_ijAs the confidence level of each binary classifier.Formula can be used (13) obtain.

When grader judgement is more reliable, difference is bigger, w_ijIt is bigger, on the contrary work as w_ijMore hour, illustrate that sample distance is overlapped Region is nearer, and classification reliability is poorer.The confidence level w of grader is obtained_ij, accordingly as blending weight by the output of grader It is defined as,

C_ij=w_ij.I, I=+1, -1, (14)

Wherein I is the judgement of two classes classification, and we make I=+1 represent first classification being judged as in the classification of two classes, I =-1 expression is judged as another classification.

In order to make decisions, the output of this 6 classifiers is constituted into a super vector below, with the method for associated translation come Make decisions.In ideal conditions, judgement confidence level w_ijIt is 1, the output valve C for now obtaining_ij=I, when sample to be identified not When belonging to two classifications that binary classifier can be recognized, the information that output valve is given is not biased towards, in any one classification, being set to Zero.The output valve obtained using this ideal situation as current class code word, as shown in table 2.In practical situations both, output valve C_ij =w_ij.I it is centered around around ideal value (code word), row decoding can be entered according to the distance of real output value and code word.Associated translation The effect of device is that the degree of closeness between actual value and ideal value is weighed by related operation, and maximum correlation is corresponding Emotional category, as recognition result,

i^*=argmax { r_i, (15)

i^*The label of the emotional category that expression is identified, r_iIt is correlation, is obtained by formula (16),

R^T=C^T.I_6×4, (16)

Wherein, R={ r₁,r₂,…r_n}.C is the sextuple column vector that each grader output valve is constituted, I_6×4It is four class emotions The classification code word matrix of six kinds of combinations is as shown in table 2.

The code word of the emotional category of table 2

Grader	It is glad	It is angry	It is sad	It is tranquil
					It is glad/angry	1	-1	0	0
It is glad/sad	1	0	-1	0
					It is glad/tranquil	1	0	0	-1
It is angry/sad	0	1	-1	0

It is angry/tranquil	0	1	0	-1
					It is sad/tranquil	0	0	1	-1

In this example, the N frame signals that one section of voice is obtained after treatment, as N number of sample, this N number of sample passes through two Class grader cluster output Subject Matrix U is the matrix of Nx2, in identification is adjudicated, judges the classification of each sample, is finally united Count the correct recognition rata of this N number of sample.

This example can be identified to glad, angry, sad, tranquil four classes speech emotional.

In order to speech-emotion recognition method increases relative to the performance of FCM methods in verifying the present invention, 2 groups have been carried out Contrast test.In battery of tests, feature is first selected, two graders are made into FCM to recognize selected four classes emotion.Second group of use The selected four class emotions of method identification in the present invention.

For the sample set trained, glad, angry, sad, tranquil four classes emotion is selected respectively to select from Emo-DB corpus 30 sentence is tested.

Two groups of recognition results are as shown in table 3：

The comparison-of-pair sorting's experimental result of table 3

From the interpretation of result of table 3 can be seen that set forth herein method totally improve than FCM discrimination a lot, wherein right Sad recognition performance improves 0.03, and 0.13 is improve to tranquil recognition performance.

The present invention chooses the speech emotional adaptive fuzzy clustering identification of different characteristic based on different emotions from the above, Improve voice-based emotion recognition rate.

Claims

1. it is a kind of based on voice fuzzy cluster emotion identification method, it is characterised in that comprise the following steps：

A. the voice signal being input into is pre-processed；The pretreatment includes preemphasis filtering and adding window framing, and voice is believed Number it is divided into N frames, wherein N is the positive integer more than 1；

B. the characteristic information of the voice signal after extraction process；The characteristic information includes mel cepstrum coefficients, fundamental tone, formant And short-time energy；In the characteristic information of extraction, the fundamental tone includes fundamental tone variance, fundamental tone minimum value；The formant includes the One resonance peak maximum, the first formant minimum value, the first resonance peak to average；Second resonance peak maximum, the second formant are equal Value；3rd resonance peak maximum, the 3rd resonance peak to average, the 3rd formant variance；The short-time energy is minimum short-time energy Value

C. multiple graders are input into after being combined voice signal and characteristic information carries out classification treatment；The grader is at least The emotional category included comprising 2 kinds of emotional categories and each grader is incomplete same；The voice signal enters with characteristic information The concrete mode of row combination is that, according to the emotional category that the grader that will be input into is included, voice signal chooses different spies Reference constitutive characteristic information vector X, the wherein row vector of X are the characteristic information that each frame voice signal is chosen, its column vector It is frame number N；The specific grader is 6, and each grader includes 2 kinds of totally 4 class emotional categories, respectively glad, raw Gas, sadness and calmness, are grouped using the method for dividision into groups two-by-two, are divided into six groups, and first group is glad/sad for glad/anger, second group Wound, the 3rd group for it is glad/tranquil, the 4th group for it is angry/sad, the 5th group for it is angry/tranquil, the 6th group be sad/tranquil；Often One group of emotional category one grader of correspondence；Each group of extraction makes the characteristic information number that two class emotions in the group are optimal, so It is afterwards from a frame voice by the row vector of each group of characteristic information composition characteristic information sequence collection X, wherein characteristic information vector X Obtained in signal, the size of row is one section of frame number of voice；Wherein, each group of characteristic information is specifically, first group of spy of extraction Reference breath is mel cepstrum coefficients, the first resonance peak maximum, the second resonance peak maximum, the 3rd resonance peak maximum, the 3rd Resonance peak to average；Second group of characteristic information of extraction is equal mel cepstrum coefficients, the first formant minimum value, the 3rd formant Value, fundamental tone minimum value, fundamental tone variance；The 3rd group of characteristic information of extraction is mel cepstrum coefficients, the first formant variance, second Resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value；4th group of characteristic information of extraction is mel cepstrum coefficients, the One resonance peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy minimum value；The 5th group of characteristic information of extraction It is mel cepstrum coefficients, the first resonance peak maximum, the first formant variance, the second resonance peak maximum, the 3rd formant side Difference；6th group of characteristic information of extraction is mel cepstrum coefficients, the first formant variance, the second resonance peak maximum, the 3rd is total to Shake peak to average, short-time energy minimum value

D. classification treatment is carried out to each grader respectively, show that voice signal is subordinate to emotional category in the classifiers Degree；Specific sorting technique is using adaptive fuzzy K mean algorithms；The use adaptive fuzzy K mean algorithms are divided The specific method of class is：

The object function of adaptive fuzzy K mean algorithms is defined as：Wherein,ForX is characterized information sequence collection, and U is degree of membership square Battle array, V is cluster centre matrix, and A induces big matrix, N to be characterized information number, i.e. sample number for the norm of c classes, and c is cluster kind Class number, m is FUZZY WEIGHTED index, u_ijRepresent membership function value of j-th sample for the i-th emotion class, v_iIt is a certain emotion The center of class, is a cluster centre vector, x_kIt is a certain characteristic information vector, A_iFor the local norm of a certain class is induced Matrix；To reach the purpose of classification, it is necessary to make object function J minimum, calculated by loop iteration, when Subject Matrix stabilization When time is exactly minimum object function, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be randomly selected； The loop iteration is calculated and comprised the following steps：

The first step：Calculate cluster centre

Second step：Calculate cluster covariance matrix

3rd step：Calculate mahalanobis distance,Wherein, A_i=[ρ_i det(F_i)]^1/nF_i ^-1, | | A_i| |=ρ_i,ρ>0,ρ_iIt is the local clustering parameter of control；

4th step：Update subordinated-degree matrix,L is the iterations of circulation；

Loop stop conditions are | | U^(l)-U^(l-1)||≤ε；Respectively by each group of characteristic information X by above-mentioned loop iteration calculate into Row treatment obtains each group of stable Subject Matrix U

E. the degree of membership result for being exported according to each grader carries out speech emotion recognition；Specific recognition methods is will be all Output result constitutes super vector, enters after row decoding to export the recognition result of judgement to super vector.

2. it is according to claim 1 it is a kind of based on voice fuzzy cluster emotion identification method, it is characterised in that step e In specific method be：

E2. the court verdict C of two classification samples in each group is defined_ij, C_ij=w_ij.I, I=+1, -1, wherein I=+1 represents sample Originally it is judged as a classification in the classification of two classes, I=-1 is represented and is judged as another classification, 6 groups of emotional semantic classifications are respectively fed to 6 graders make decisions output；

E3. calculated by associated translation, correlation computations formula is R^T=C^T.I_6x4, wherein C is 6 output result groups of grader Into a column vector,It is four class emotions, six kinds of classification code word matrix of combination, wherein R={ r₁, r₂,…r_n}；

E4. judge recognition result, use i^*Represent the label of the emotional category that sample is identified, wherein i^*=arg max { r_i}。