CN104077598B - A kind of emotion identification method based on voice fuzzy cluster - Google Patents

A kind of emotion identification method based on voice fuzzy cluster Download PDF

Info

Publication number
CN104077598B
CN104077598B CN201410299493.3A CN201410299493A CN104077598B CN 104077598 B CN104077598 B CN 104077598B CN 201410299493 A CN201410299493 A CN 201410299493A CN 104077598 B CN104077598 B CN 104077598B
Authority
CN
China
Prior art keywords
group
characteristic information
resonance peak
emotion
formant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410299493.3A
Other languages
Chinese (zh)
Other versions
CN104077598A (en
Inventor
周代英
谭发曾
贾继超
田兵兵
谭敏洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410299493.3A priority Critical patent/CN104077598B/en
Publication of CN104077598A publication Critical patent/CN104077598A/en
Application granted granted Critical
Publication of CN104077598B publication Critical patent/CN104077598B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to speech emotion recognition technology, a kind of emotion identification method based on voice fuzzy cluster is particularly related to.The method of the present invention includes:Voice signal to being input into is pre-processed;The characteristic information of the voice signal after extraction process;Multiclass emotion is grouped, and corresponding characteristic information is chosen according to the type after the packet of multiclass emotion respectively;Classification treatment is carried out according to the characteristic information that each group of emotion class combination is chosen respectively;Speech emotion recognition is carried out according to the output result after each group of emotion class assembled classification;Beneficial effects of the present invention are to choose different features by different emotions, and the recognition effect with the FCM methods of the improved adaptive fuzzy K mean cluster method same feature of emotions more all than traditional approach is well a lot, and discrimination is higher, and effect is more preferable.Present invention is particularly suitable for speech-sound intelligent emotion recognition.

Description

A kind of emotion identification method based on voice fuzzy cluster
Technical field
The present invention relates to speech emotion recognition technology, particularly relate to a kind of emotion based on voice fuzzy cluster and know Other method.
Background technology
With the development of artificial intelligence, emotion intelligently generates affection computation combined with computer technology, and this brand-new grinds Study carefully problem.Language is the important instrument of Human communication, and the mankind not only contain text symbol information in speaking, but also comprising Emotion information.To speech emotional information processing, have great importance in signal transacting and artificial intelligence field.In voice In emotion recognition field, many experts and scholars have done substantial amounts of research work, including set up a speech emotional storehouse for standard, language Sound feature extraction, classifying identification method research.In speech emotional feature selecting, forefathers have also been made many researchs, but not refer to Go out to recognize which the specific features of specific emotion have.Because speech emotional has ambiguity in itself, so experts scholar Try to be carried out being used in speech emotion recognition, but their research with the method for fuzzy clustering same feature to multiclass feelings Sense is identified, and recognition effect is unsatisfactory.Many clustering algorithms determine cluster, base based on Euclidean or mahalanobis distance measurement It is intended to find the spherical cluster with similar dimension and density in the algorithm of such distance metric.But, an emotion cluster can Can be arbitrary shape, therefore the clustering algorithm for using at present can not well recognize voice class.
The content of the invention
It is to be solved by this invention, aiming at the above mentioned problem that conventional art is present, propose a kind of based on voice fuzzy The emotion identification method of cluster.
The present invention solves the technical scheme that is used of above-mentioned technical problem:A kind of emotion based on voice fuzzy cluster is known Other method, it is characterised in that comprise the following steps:
A. the voice signal being input into is pre-processed;The pretreatment includes preemphasis filtering and adding window framing, by language Message number is divided into N frames, and wherein N is the positive integer more than 1;
B. the characteristic information of the voice signal after extraction process;The characteristic information includes mel cepstrum coefficients, fundamental tone, is total to Shake peak and short-time energy;
C. multiple graders are input into after being combined voice signal and characteristic information carries out classification treatment;The grader The emotional category included including at least 2 kinds of emotional categories and each grader is incomplete same;The voice signal is believed with feature The concrete mode that breath is combined is that, according to the emotional category that the grader that will be input into is included, voice signal is chosen different Characteristic signal constitutive characteristic information vector X, the wherein row vector of X is the characteristic information that each frame voice signal is chosen, its row Vector is frame number N;
D. classification treatment is carried out to each grader respectively, draws voice signal with emotional category in the classifiers Degree of membership;Specific sorting technique is using adaptive fuzzy K mean algorithms;
E. the degree of membership result for being exported according to each grader carries out speech emotion recognition;Specific recognition methods be by All output results constitute super vectors, enter after row decoding to export the recognition result of judgement to super vector.
Specifically, in the characteristic information extracted in step b, the fundamental tone includes fundamental tone variance, fundamental tone minimum value;It is described common Shake peak including first resonance peak maximum, the first formant minimum value, first resonance peak to average;Second resonance peak maximum, the Two resonance peak to average;3rd resonance peak maximum, the 3rd resonance peak to average, the 3rd formant variance;The short-time energy is short When energy-minimum;
Specifically, multiclass emotion described in step c is 4 classes, and it is respectively glad, angry, sad and tranquil, its specific point Prescription method is divided into six groups to be grouped two-by-two, first group for it is glad/angry, second group for it is glad/sad, the 3rd group for it is glad/ It is tranquil, the 4th group for it is angry/sad, the 5th group for it is angry/tranquil, the 6th group be sad/tranquil;Each group of extraction makes in the group The characteristic information number that two class emotions are optimal, then by each group of characteristic information composition characteristic information sequence collection X, wherein special The row vector of reference breath X is obtained from a frame voice signal, and the size of row is one section of frame number of voice;Wherein, each group of spy Reference cease specifically, first group of characteristic information of extraction be mel cepstrum coefficients, the first resonance peak maximum, the second formant most Big value, the 3rd resonance peak maximum, the 3rd resonance peak to average;Second group of characteristic information of extraction is mel cepstrum coefficients, first Formant minimum value, the 3rd resonance peak to average, fundamental tone minimum value, fundamental tone variance;The 3rd group of characteristic information of extraction falls for Mel Spectral coefficient, the first formant variance, the second resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value;4th group of extraction Characteristic information is mel cepstrum coefficients, first resonates peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy most Small value;The 5th group of characteristic information of extraction is mel cepstrum coefficients, the first resonance peak maximum, the first formant variance, second Resonance peak maximum, the 3rd formant variance;The 6th group of characteristic information of extraction is mel cepstrum coefficients, the first formant side Difference, the second resonance peak maximum, the 3rd resonance peak to average, short-time energy minimum value.
Specifically, used described in step d the specific method that adaptive fuzzy K mean algorithms are classified for:
The object function of adaptive fuzzy K mean algorithms is defined as:Wherein,ForX is characterized information sequence collection, and U is person in servitude Category degree matrix, V is cluster centre matrix, and A induces big matrix, N to be characterized information number, i.e. sample number for the norm of c classes, and c is Cluster species number, m is FUZZY WEIGHTED index, uikRepresent membership function value of k-th sample for the i-th emotion class, viFor certain The center of one emotion class, is a cluster centre vector, xkIt is a certain characteristic information vector, AiIt is the local model of a certain class Number induced matrix;To reach the purpose of classification, it is necessary to make object function J minimum, calculated by loop iteration, when Subject Matrix is steady When being exactly minimum object function when determining, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be random Choose;The loop iteration is calculated and comprised the following steps:
The first step:Calculate cluster centre
Second step:Calculate cluster covariance matrix
3rd step:Calculate mahalanobis distance, Wherein,||Ai| |=ρi, ρ > 0, ρiIt is the local clustering parameter of control;
4th step:Update subordinated-degree matrix,
L is the iterations of circulation, and loop stop conditions are | | U(l)-U(l-1)||≤ε;Each group of feature is believed respectively Breath X is calculated by above-mentioned loop iteration and is carried out processing the stable Subject Matrix U for obtaining each group.
Specifically, the specific method of step e is:
E1. the confidence level w of sample in each group is sought according to the Subject Matrix U tried to achieve in step dij,
E2. the court verdict C of two classification samples in each group is definedij, Cij=wij.I, I=+1, wherein -1, I=+1 tables Sample is originally judged as first classification in the classification of two classes, and I=-1 is represented and is judged as another classification, by 6 groups of emotional semantic classifications point Not Song Ru 6 graders make decisions output;
E3. calculated by associated translation, correlation computations formula is RT=CT.I6×4, wherein C is 6 output knots of grader Fruit constitutes a column vector,It is four class emotions, six kinds of classification code word matrix of combination, wherein R= {r1,r2,…rn};
E4. judge recognition result, use i*Represent the label of the emotional category that sample is identified, wherein i*=argmax {ri}。
Beneficial effects of the present invention are to choose different features by different emotions, with improved adaptive fuzzy K averages Clustering method emotions more all than traditional approach are well many with the recognition effect of the FCM methods of same feature, and discrimination Higher, effect is more preferable.
Brief description of the drawings
Fig. 1 is speech emotion recognition flow chart of the invention;
Fig. 2 is judgement identification process figure of the invention.
Specific embodiment
With reference to the accompanying drawings and examples, technical scheme is described in detail:
Embodiment:
This example selects glad, angry, sad, the tranquil four classes emotion to carry out voice based on Berlin speech emotional storehouse (Emo-DB) Emotion recognition;
As shown in figure 1, this example is comprised the following steps:
S1:Voice is pre-processed
Pretreatment includes preemphasis and adding window framing.
Preemphasis treatment:The purpose of preemphasis is that the frequency spectrum for making signal becomes flat, keeps from low to high whole In frequency band, frequency spectrum can be sought with same signal to noise ratio, be analyzed in order to spectrum analysis or channel parameters.Preemphasis is usually to use one Digital filter H (z) of rank=1- α z-1, wherein α is pre emphasis factor, and α takes 0.9 in this example.Primary speech signal S is by pre- X (l) is obtained after aggravating filtering.
Framing:Voice is carried out into framing with the Hamming window that length is 23ms, N frames letter is obtained after one section of voice signal framing Number, each frame is considered as a sample.
It is changed into x after signal x (l) windowing processnM (), formula is as follows:
xn(m)=w (m) x (n+m) 0≤m≤N-1 (1)
Hamming window:
S2:Feature extraction
This example extracts the deformation of Short Time Speech feature and correlated characteristic with voicebox, and voicebox is based on MATLAB languages One speech processes tool box of speech.
The feature extracted includes mel cepstrum coefficients (MFCC), fundamental tone, formant, short-time energy.
Mel cepstrum coefficients (MFCC):Mel cepstrum coefficients are what the auditory properties based on human ear were proposed, and it is using a kind of Nonlinear cps (Mel frequencies) simulate the auditory system of people.Experiment finds, in below 1000Hz, perception with Frequency is linear, and more than 1000Hz, perception then with frequency into logarithmic relationship.So having difference to different frequencies Perception, it is and especially sensitive to low frequency.Conversion formula between frequency f and MEL frequency is
In formula, f is frequency, unit:Hz.A frame voice signal takes 12 Jan Vermeer cepstrum coefficients in the present invention
Fundamental tone:Voice is that the different vibration of a series of frequencies for sending of sounding body, amplitude is composited.These shake There is the minimum vibration of a frequency in dynamic, the sound sent by it is exactly fundamental tone, and multiple fundamental tones can be tried to achieve in a frame voice signal, Thus can further obtain the fundamental tone minimum value and fundamental tone variance in a frame signal.
Formant:Sound when by resonant cavity, by the filter action of cavity so that the energy of different frequency in frequency domain Redistribute, because the resonant interaction of resonant cavity is strengthened, another part is then attenuated, those for being strengthened for a part Frequency shows as dense blackstreak on the sonagram of time frequency analysis.Because Energy distribution is uneven, strong part is just as mountain Peak is general, so referred to as formant, can choose first common in a frame voice signal in the hope of multiple formants, in the present invention Shake peak, the second formant, the 3rd formant, thus can further try to achieve the minimum value, most of formant in a frame signal Big value and variance.
Short-time energy:So-called short-time energy is exactly the energy sum of a frame signal.Formula is
S3:Feature selecting
Four class emotions are used combination of two by this example, be obtained six groups of emotions to (it is glad/angry, it is glad/sad, it is glad/ Calmness, it is angry/sad, it is angry/tranquil, sad/tranquil), different combinations of features are then chosen as the defeated of each grader Enter, specific features selection is as shown in table 1:
The best features group selection of table 1
As shown in figure 1, the different emotions of identification are to different characteristic informations, feature selecting 1 recognizes glad/angry emotion Right, feature selecting 2 recognizes glad/sad emotion pair, and feature selecting 3 recognizes glad/tranquil emotion pair, the identification life of feature selecting 4 Gas/sadness emotion pair, feature selecting 5 recognizes angry/tranquil emotion pair, and feature selecting 6 recognizes sad/tranquil emotion pair.
In feature selecting, the feature that feature selecting 1 is chosen has MFCC, the first resonance peak maximum, the second formant maximum Value, the 3rd resonance peak maximum, the 3rd resonance peak to average, the feature that feature selecting 2 is chosen have MFCC, the first formant minimum Value, the 3rd resonance peak to average, fundamental tone minimum value, fundamental tone variance, the feature that feature selecting 3 is chosen have MFCC, the first formant side Difference, the second resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value, the feature that feature selecting 4 is chosen have MFCC, first Resonance peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy minimum value, the feature that feature selecting 5 is chosen have MFCC, the first resonance peak maximum, the first formant variance, the second resonance peak maximum, the 3rd formant variance, feature selecting 6 features chosen have MFCC, the first formant variance, the second resonance peak maximum, the 3rd resonance peak to average, short-time energy minimum Value.
Glad/angry one group of feature group is chosen in the present invention to illustrate, one section of voice by treatment above just N frame signals, that is, N number of sample can be obtained, each sample will extract MFCC, the first resonance peak maximum, the second formant The features such as maximum, the 3rd resonance peak maximum, the 3rd resonance peak to average, these features can constitute one 5 dimension row vector, institute The matrix X of Nx5 is just obtained with one section of voice signal, other groups are by that analogy.
S4:Grader is selected
Speech emotional classification is classified with improved fuzzy K mean algorithms in the present invention.
In FCM (Fuzzy C-Means) algorithm, it is assumed that have a sample sequence collection X={ x1,x2,……,xN, then its Object function is as follows,
Wherein, N represents number of samples;C represents cluster species number, and m is FUZZY WEIGHTED index, and he controls fog-level, It is a critically important parameter, xkIt is some sample vector, viIt is the center of a certain class, is a cluster centre vector. uikRepresent membership function value of k-th sample for the i-th class, and a certain sample to the membership function value of each cluster centre Sum should be 1, i.e.,
U is subordinated-degree matrix, V={ v1,v2,...,vc},vi∈RnIt is the cluster centre vector for needing to be determined, A is c classes Norm induce big matrix.
Formula (5) be an inner product apart from norm square.To obtain the minimum of formula (3) under conditions of (4) formula, D is sought to viAnd uikPartially and equal to zero, can try to achieve
Initialization cluster centre vi, cluster centre quantity c and Fuzzy Weighting Exponent m are set, then by formula (6) and formula (7) weight Multiple iteration reaches stabilization until being subordinate to angle value.FCM algorithms are, with the Euclidean distance criterion of standard, to refer to hypersphere and gather Class, therefore the spherical cluster of similar dimension and density can only be detected because selection identical standard norm induced matrix be A=I or Person A can be defined as the inverse of the covariance matrix of n × n:A=F-1, wherein F is as follows:
The sample average of data is represented, herein, A employs mahalanobis distance.
The cluster of different geometries is concentrated in a data in order to detect, the present invention is using self adaptation apart from norm to mark Quasi-mode paste K mean algorithms are improved and obtain adaptive fuzzy K mean algorithms, and each cluster uses the induced matrix of oneself Ai, this will produce following inner product norm:
Matrix AiAs optimized variable in K mean value functions, so that the self adaptation distance function of each class is to data set Local topology it is optimal.The object function of self adaptation FCM algorithms is defined as follows:
Object function can not be relative to AiDirectly minimize, because AiIt is linear, that is to say, that as long as AiPositive fixed number get over Few object function is smaller.In order to obtain a feasible solution, it is necessary to AiBe any limitation as, typically to AiDeterminant is restricted, Allow matrix AiDeterminant is fixed as certain value makes cluster shape optimal while keeping locality set size constant.||Ai| |=ρi, ρ > 0, ρiAll it is fixed value for each class, then with method of Lagrange multipliers, solves AiIt is as follows:
Wherein FiIt is the fuzzy covariance matrix of the i-th class, is defined as follows:
Formula (10) will be brought into x will be obtained in formula (11), (12) abovekTo cluster average vkGeneral Quadratic mahalanobis distance Norm, it can be seen that be to be weighting in this covariance with subordinated-degree matrix U.
It is in the present invention the purpose for reaching classification, it is necessary to make object functionMinimum, Wherein X is the characteristic information matrix that step S3 is obtained, and U is subordinated-degree matrix, and V is cluster centre matrix, and A is that the norm of 4 classes is lured Big matrix is led, N is characterized information number, i.e. sample number, and c=4 is cluster species number, and m=1 is FUZZY WEIGHTED index, uikRepresent K-th sample for the i-th emotion class membership function value, viIt is the center of a certain emotion class, is a cluster centre arrow Amount, xkIt is the characteristic information vector of a certain sample, is calculated by loop iteration, is exactly target letter when Subject Matrix stabilization When number is minimum, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be randomly selected, | | Ai| |=ρi, ρ > 0,.Specific circulation step is as follows:
The first step calculates cluster centre,
Second step calculates cluster covariance matrix,
3rd step calculates mahalanobis distance,
Wherein
4th step updates subordinated-degree matrix,
L is the iterations of circulation, and loop stop conditions are | | U(l)-U(l-1)||≤ε;Each group of feature is believed respectively Breath X is processed by above-mentioned execution step can just obtain each group of stable Subject Matrix U.Such 6 groups of characteristic informations are by 6 The fuzzy clustering treatment of two graders, exports 6 Subject Matrix of stabilization.
S5:Judgement identification
In the judgement of this paper multi-categorizers, the output confidence level first to each sub-classifier is evaluated, Ran Houtong Cross related operation to make decisions, calculate final recognition result.Analysis, two classes are identified by single sample in the present invention U in the subordinated-degree matrix U of self adaptation FCM outputsikAnd ujkRepresent that k samples belong to the degree of i classes and j classes.When two degrees of membership Difference it is bigger, illustrate identification correctness it is bigger.W can be usedijAs the confidence level of each binary classifier.Formula can be used (13) obtain.
When grader judgement is more reliable, difference is bigger, wijIt is bigger, on the contrary work as wijMore hour, illustrate that sample distance is overlapped Region is nearer, and classification reliability is poorer.The confidence level w of grader is obtainedij, accordingly as blending weight by the output of grader It is defined as,
Cij=wij.I, I=+1, -1, (14)
Wherein I is the judgement of two classes classification, and we make I=+1 represent first classification being judged as in the classification of two classes, I =-1 expression is judged as another classification.
In order to make decisions, the output of this 6 classifiers is constituted into a super vector below, with the method for associated translation come Make decisions.In ideal conditions, judgement confidence level wijIt is 1, the output valve C for now obtainingij=I, when sample to be identified not When belonging to two classifications that binary classifier can be recognized, the information that output valve is given is not biased towards, in any one classification, being set to Zero.The output valve obtained using this ideal situation as current class code word, as shown in table 2.In practical situations both, output valve Cij =wij.I it is centered around around ideal value (code word), row decoding can be entered according to the distance of real output value and code word.Associated translation The effect of device is that the degree of closeness between actual value and ideal value is weighed by related operation, and maximum correlation is corresponding Emotional category, as recognition result,
i*=argmax { ri, (15)
i*The label of the emotional category that expression is identified, riIt is correlation, is obtained by formula (16),
RT=CT.I6×4, (16)
Wherein, R={ r1,r2,…rn}.C is the sextuple column vector that each grader output valve is constituted, I6×4It is four class emotions The classification code word matrix of six kinds of combinations is as shown in table 2.
The code word of the emotional category of table 2
Grader It is glad It is angry It is sad It is tranquil
It is glad/angry 1 -1 0 0
It is glad/sad 1 0 -1 0
It is glad/tranquil 1 0 0 -1
It is angry/sad 0 1 -1 0
It is angry/tranquil 0 1 0 -1
It is sad/tranquil 0 0 1 -1
In this example, the N frame signals that one section of voice is obtained after treatment, as N number of sample, this N number of sample passes through two Class grader cluster output Subject Matrix U is the matrix of Nx2, in identification is adjudicated, judges the classification of each sample, is finally united Count the correct recognition rata of this N number of sample.
This example can be identified to glad, angry, sad, tranquil four classes speech emotional.
In order to speech-emotion recognition method increases relative to the performance of FCM methods in verifying the present invention, 2 groups have been carried out Contrast test.In battery of tests, feature is first selected, two graders are made into FCM to recognize selected four classes emotion.Second group of use The selected four class emotions of method identification in the present invention.
For the sample set trained, glad, angry, sad, tranquil four classes emotion is selected respectively to select from Emo-DB corpus 30 sentence is tested.
Two groups of recognition results are as shown in table 3:
The comparison-of-pair sorting's experimental result of table 3
From the interpretation of result of table 3 can be seen that set forth herein method totally improve than FCM discrimination a lot, wherein right Sad recognition performance improves 0.03, and 0.13 is improve to tranquil recognition performance.
The present invention chooses the speech emotional adaptive fuzzy clustering identification of different characteristic based on different emotions from the above, Improve voice-based emotion recognition rate.

Claims (2)

1. it is a kind of based on voice fuzzy cluster emotion identification method, it is characterised in that comprise the following steps:
A. the voice signal being input into is pre-processed;The pretreatment includes preemphasis filtering and adding window framing, and voice is believed Number it is divided into N frames, wherein N is the positive integer more than 1;
B. the characteristic information of the voice signal after extraction process;The characteristic information includes mel cepstrum coefficients, fundamental tone, formant And short-time energy;In the characteristic information of extraction, the fundamental tone includes fundamental tone variance, fundamental tone minimum value;The formant includes the One resonance peak maximum, the first formant minimum value, the first resonance peak to average;Second resonance peak maximum, the second formant are equal Value;3rd resonance peak maximum, the 3rd resonance peak to average, the 3rd formant variance;The short-time energy is minimum short-time energy Value
C. multiple graders are input into after being combined voice signal and characteristic information carries out classification treatment;The grader is at least The emotional category included comprising 2 kinds of emotional categories and each grader is incomplete same;The voice signal enters with characteristic information The concrete mode of row combination is that, according to the emotional category that the grader that will be input into is included, voice signal chooses different spies Reference constitutive characteristic information vector X, the wherein row vector of X are the characteristic information that each frame voice signal is chosen, its column vector It is frame number N;The specific grader is 6, and each grader includes 2 kinds of totally 4 class emotional categories, respectively glad, raw Gas, sadness and calmness, are grouped using the method for dividision into groups two-by-two, are divided into six groups, and first group is glad/sad for glad/anger, second group Wound, the 3rd group for it is glad/tranquil, the 4th group for it is angry/sad, the 5th group for it is angry/tranquil, the 6th group be sad/tranquil;Often One group of emotional category one grader of correspondence;Each group of extraction makes the characteristic information number that two class emotions in the group are optimal, so It is afterwards from a frame voice by the row vector of each group of characteristic information composition characteristic information sequence collection X, wherein characteristic information vector X Obtained in signal, the size of row is one section of frame number of voice;Wherein, each group of characteristic information is specifically, first group of spy of extraction Reference breath is mel cepstrum coefficients, the first resonance peak maximum, the second resonance peak maximum, the 3rd resonance peak maximum, the 3rd Resonance peak to average;Second group of characteristic information of extraction is equal mel cepstrum coefficients, the first formant minimum value, the 3rd formant Value, fundamental tone minimum value, fundamental tone variance;The 3rd group of characteristic information of extraction is mel cepstrum coefficients, the first formant variance, second Resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value;4th group of characteristic information of extraction is mel cepstrum coefficients, the One resonance peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy minimum value;The 5th group of characteristic information of extraction It is mel cepstrum coefficients, the first resonance peak maximum, the first formant variance, the second resonance peak maximum, the 3rd formant side Difference;6th group of characteristic information of extraction is mel cepstrum coefficients, the first formant variance, the second resonance peak maximum, the 3rd is total to Shake peak to average, short-time energy minimum value
D. classification treatment is carried out to each grader respectively, show that voice signal is subordinate to emotional category in the classifiers Degree;Specific sorting technique is using adaptive fuzzy K mean algorithms;The use adaptive fuzzy K mean algorithms are divided The specific method of class is:
The object function of adaptive fuzzy K mean algorithms is defined as:Wherein,ForX is characterized information sequence collection, and U is degree of membership square Battle array, V is cluster centre matrix, and A induces big matrix, N to be characterized information number, i.e. sample number for the norm of c classes, and c is cluster kind Class number, m is FUZZY WEIGHTED index, uijRepresent membership function value of j-th sample for the i-th emotion class, viIt is a certain emotion The center of class, is a cluster centre vector, xkIt is a certain characteristic information vector, AiFor the local norm of a certain class is induced Matrix;To reach the purpose of classification, it is necessary to make object function J minimum, calculated by loop iteration, when Subject Matrix stabilization When time is exactly minimum object function, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be randomly selected; The loop iteration is calculated and comprised the following steps:
The first step:Calculate cluster centre
Second step:Calculate cluster covariance matrix
3rd step:Calculate mahalanobis distance,Wherein, Ai=[ρi det(Fi)]1/nFi -1, | | Ai| |=ρi,ρ>0,ρiIt is the local clustering parameter of control;
4th step:Update subordinated-degree matrix,L is the iterations of circulation;
Loop stop conditions are | | U(l)-U(l-1)||≤ε;Respectively by each group of characteristic information X by above-mentioned loop iteration calculate into Row treatment obtains each group of stable Subject Matrix U
E. the degree of membership result for being exported according to each grader carries out speech emotion recognition;Specific recognition methods is will be all Output result constitutes super vector, enters after row decoding to export the recognition result of judgement to super vector.
2. it is according to claim 1 it is a kind of based on voice fuzzy cluster emotion identification method, it is characterised in that step e In specific method be:
E1. the confidence level w of sample in each group is sought according to the Subject Matrix U tried to achieve in step dij,
E2. the court verdict C of two classification samples in each group is definedij, Cij=wij.I, I=+1, -1, wherein I=+1 represents sample Originally it is judged as a classification in the classification of two classes, I=-1 is represented and is judged as another classification, 6 groups of emotional semantic classifications are respectively fed to 6 graders make decisions output;
E3. calculated by associated translation, correlation computations formula is RT=CT.I6x4, wherein C is 6 output result groups of grader Into a column vector,It is four class emotions, six kinds of classification code word matrix of combination, wherein R={ r1, r2,…rn};
E4. judge recognition result, use i*Represent the label of the emotional category that sample is identified, wherein i*=arg max { ri}。
CN201410299493.3A 2014-06-27 2014-06-27 A kind of emotion identification method based on voice fuzzy cluster Expired - Fee Related CN104077598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410299493.3A CN104077598B (en) 2014-06-27 2014-06-27 A kind of emotion identification method based on voice fuzzy cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410299493.3A CN104077598B (en) 2014-06-27 2014-06-27 A kind of emotion identification method based on voice fuzzy cluster

Publications (2)

Publication Number Publication Date
CN104077598A CN104077598A (en) 2014-10-01
CN104077598B true CN104077598B (en) 2017-05-31

Family

ID=51598844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410299493.3A Expired - Fee Related CN104077598B (en) 2014-06-27 2014-06-27 A kind of emotion identification method based on voice fuzzy cluster

Country Status (1)

Country Link
CN (1) CN104077598B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650576A (en) * 2016-09-22 2017-05-10 中国矿业大学 Mining equipment health state judgment method based on noise characteristic statistic
CN107886056A (en) * 2017-10-27 2018-04-06 江苏大学 A kind of electronic nose of fuzzy covariance learning network differentiates vinegar kind method
CN108122552B (en) * 2017-12-15 2021-10-15 上海智臻智能网络科技股份有限公司 Voice emotion recognition method and device
EP3729419A1 (en) * 2017-12-19 2020-10-28 Wonder Group Technologies Ltd. Method and apparatus for emotion recognition from speech
CN109065071B (en) * 2018-08-31 2021-05-14 电子科技大学 Song clustering method based on iterative k-means algorithm
CN111898690B (en) * 2020-08-05 2022-11-18 山东大学 Power transformer fault classification method and system
CN113611326B (en) * 2021-08-26 2023-05-12 中国地质大学(武汉) Real-time voice emotion recognition method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
CN101620853A (en) * 2008-07-01 2010-01-06 邹采荣 Speech-emotion recognition method based on improved fuzzy vector quantization

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100631786B1 (en) * 2005-02-18 2006-10-12 삼성전자주식회사 Method and apparatus for speech recognition by measuring frame's confidence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620853A (en) * 2008-07-01 2010-01-06 邹采荣 Speech-emotion recognition method based on improved fuzzy vector quantization
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion

Also Published As

Publication number Publication date
CN104077598A (en) 2014-10-01

Similar Documents

Publication Publication Date Title
CN104077598B (en) A kind of emotion identification method based on voice fuzzy cluster
US7362892B2 (en) Self-optimizing classifier
Kamaruddin et al. Cultural dependency analysis for understanding speech emotion
CN110289003A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN107291822A (en) The problem of based on deep learning disaggregated model training method, sorting technique and device
CN105261367B (en) A kind of method for distinguishing speek person
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN104035996B (en) Field concept abstracting method based on Deep Learning
CN107122352A (en) A kind of method of the extracting keywords based on K MEANS, WORD2VEC
CN105609116B (en) A kind of automatic identifying method in speech emotional dimension region
CN102201237B (en) Emotional speaker identification method based on reliability detection of fuzzy support vector machine
CN110046252A (en) A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN107180084A (en) Word library updating method and device
CN112905739B (en) False comment detection model training method, detection method and electronic equipment
CN111899766B (en) Speech emotion recognition method based on optimization fusion of depth features and acoustic features
CN112437053B (en) Intrusion detection method and device
CN110085216A (en) A kind of vagitus detection method and device
CN103631753A (en) Progressively-decreased subspace ensemble learning algorithm
Phan et al. Multi-view audio and music classification
Fan et al. Modeling voice pathology detection using imbalanced learning
CN112466284B (en) Mask voice identification method
Kamaruddin et al. Features extraction for speech emotion
Bassiou et al. Greek folk music classification into two genres using lyrics and audio via canonical correlation analysis
Jeyakarthic et al. Optimal bidirectional long short term memory based sentiment analysis with sarcasm detection and classification on twitter data
CN106448660B (en) It is a kind of introduce big data analysis natural language smeared out boundary determine method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170531

Termination date: 20180627

CF01 Termination of patent right due to non-payment of annual fee