CN104077598B - A kind of emotion identification method based on voice fuzzy cluster - Google Patents
A kind of emotion identification method based on voice fuzzy cluster Download PDFInfo
- Publication number
- CN104077598B CN104077598B CN201410299493.3A CN201410299493A CN104077598B CN 104077598 B CN104077598 B CN 104077598B CN 201410299493 A CN201410299493 A CN 201410299493A CN 104077598 B CN104077598 B CN 104077598B
- Authority
- CN
- China
- Prior art keywords
- group
- characteristic information
- resonance peak
- emotion
- formant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
The present invention relates to speech emotion recognition technology, a kind of emotion identification method based on voice fuzzy cluster is particularly related to.The method of the present invention includes:Voice signal to being input into is pre-processed;The characteristic information of the voice signal after extraction process;Multiclass emotion is grouped, and corresponding characteristic information is chosen according to the type after the packet of multiclass emotion respectively;Classification treatment is carried out according to the characteristic information that each group of emotion class combination is chosen respectively;Speech emotion recognition is carried out according to the output result after each group of emotion class assembled classification;Beneficial effects of the present invention are to choose different features by different emotions, and the recognition effect with the FCM methods of the improved adaptive fuzzy K mean cluster method same feature of emotions more all than traditional approach is well a lot, and discrimination is higher, and effect is more preferable.Present invention is particularly suitable for speech-sound intelligent emotion recognition.
Description
Technical field
The present invention relates to speech emotion recognition technology, particularly relate to a kind of emotion based on voice fuzzy cluster and know
Other method.
Background technology
With the development of artificial intelligence, emotion intelligently generates affection computation combined with computer technology, and this brand-new grinds
Study carefully problem.Language is the important instrument of Human communication, and the mankind not only contain text symbol information in speaking, but also comprising
Emotion information.To speech emotional information processing, have great importance in signal transacting and artificial intelligence field.In voice
In emotion recognition field, many experts and scholars have done substantial amounts of research work, including set up a speech emotional storehouse for standard, language
Sound feature extraction, classifying identification method research.In speech emotional feature selecting, forefathers have also been made many researchs, but not refer to
Go out to recognize which the specific features of specific emotion have.Because speech emotional has ambiguity in itself, so experts scholar
Try to be carried out being used in speech emotion recognition, but their research with the method for fuzzy clustering same feature to multiclass feelings
Sense is identified, and recognition effect is unsatisfactory.Many clustering algorithms determine cluster, base based on Euclidean or mahalanobis distance measurement
It is intended to find the spherical cluster with similar dimension and density in the algorithm of such distance metric.But, an emotion cluster can
Can be arbitrary shape, therefore the clustering algorithm for using at present can not well recognize voice class.
The content of the invention
It is to be solved by this invention, aiming at the above mentioned problem that conventional art is present, propose a kind of based on voice fuzzy
The emotion identification method of cluster.
The present invention solves the technical scheme that is used of above-mentioned technical problem:A kind of emotion based on voice fuzzy cluster is known
Other method, it is characterised in that comprise the following steps:
A. the voice signal being input into is pre-processed;The pretreatment includes preemphasis filtering and adding window framing, by language
Message number is divided into N frames, and wherein N is the positive integer more than 1;
B. the characteristic information of the voice signal after extraction process;The characteristic information includes mel cepstrum coefficients, fundamental tone, is total to
Shake peak and short-time energy;
C. multiple graders are input into after being combined voice signal and characteristic information carries out classification treatment;The grader
The emotional category included including at least 2 kinds of emotional categories and each grader is incomplete same;The voice signal is believed with feature
The concrete mode that breath is combined is that, according to the emotional category that the grader that will be input into is included, voice signal is chosen different
Characteristic signal constitutive characteristic information vector X, the wherein row vector of X is the characteristic information that each frame voice signal is chosen, its row
Vector is frame number N;
D. classification treatment is carried out to each grader respectively, draws voice signal with emotional category in the classifiers
Degree of membership;Specific sorting technique is using adaptive fuzzy K mean algorithms;
E. the degree of membership result for being exported according to each grader carries out speech emotion recognition;Specific recognition methods be by
All output results constitute super vectors, enter after row decoding to export the recognition result of judgement to super vector.
Specifically, in the characteristic information extracted in step b, the fundamental tone includes fundamental tone variance, fundamental tone minimum value;It is described common
Shake peak including first resonance peak maximum, the first formant minimum value, first resonance peak to average;Second resonance peak maximum, the
Two resonance peak to average;3rd resonance peak maximum, the 3rd resonance peak to average, the 3rd formant variance;The short-time energy is short
When energy-minimum;
Specifically, multiclass emotion described in step c is 4 classes, and it is respectively glad, angry, sad and tranquil, its specific point
Prescription method is divided into six groups to be grouped two-by-two, first group for it is glad/angry, second group for it is glad/sad, the 3rd group for it is glad/
It is tranquil, the 4th group for it is angry/sad, the 5th group for it is angry/tranquil, the 6th group be sad/tranquil;Each group of extraction makes in the group
The characteristic information number that two class emotions are optimal, then by each group of characteristic information composition characteristic information sequence collection X, wherein special
The row vector of reference breath X is obtained from a frame voice signal, and the size of row is one section of frame number of voice;Wherein, each group of spy
Reference cease specifically, first group of characteristic information of extraction be mel cepstrum coefficients, the first resonance peak maximum, the second formant most
Big value, the 3rd resonance peak maximum, the 3rd resonance peak to average;Second group of characteristic information of extraction is mel cepstrum coefficients, first
Formant minimum value, the 3rd resonance peak to average, fundamental tone minimum value, fundamental tone variance;The 3rd group of characteristic information of extraction falls for Mel
Spectral coefficient, the first formant variance, the second resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value;4th group of extraction
Characteristic information is mel cepstrum coefficients, first resonates peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy most
Small value;The 5th group of characteristic information of extraction is mel cepstrum coefficients, the first resonance peak maximum, the first formant variance, second
Resonance peak maximum, the 3rd formant variance;The 6th group of characteristic information of extraction is mel cepstrum coefficients, the first formant side
Difference, the second resonance peak maximum, the 3rd resonance peak to average, short-time energy minimum value.
Specifically, used described in step d the specific method that adaptive fuzzy K mean algorithms are classified for:
The object function of adaptive fuzzy K mean algorithms is defined as:Wherein,ForX is characterized information sequence collection, and U is person in servitude
Category degree matrix, V is cluster centre matrix, and A induces big matrix, N to be characterized information number, i.e. sample number for the norm of c classes, and c is
Cluster species number, m is FUZZY WEIGHTED index, uikRepresent membership function value of k-th sample for the i-th emotion class, viFor certain
The center of one emotion class, is a cluster centre vector, xkIt is a certain characteristic information vector, AiIt is the local model of a certain class
Number induced matrix;To reach the purpose of classification, it is necessary to make object function J minimum, calculated by loop iteration, when Subject Matrix is steady
When being exactly minimum object function when determining, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be random
Choose;The loop iteration is calculated and comprised the following steps:
The first step:Calculate cluster centre
Second step:Calculate cluster covariance matrix
3rd step:Calculate mahalanobis distance,
Wherein,||Ai| |=ρi, ρ > 0, ρiIt is the local clustering parameter of control;
4th step:Update subordinated-degree matrix,
L is the iterations of circulation, and loop stop conditions are | | U(l)-U(l-1)||≤ε;Each group of feature is believed respectively
Breath X is calculated by above-mentioned loop iteration and is carried out processing the stable Subject Matrix U for obtaining each group.
Specifically, the specific method of step e is:
E1. the confidence level w of sample in each group is sought according to the Subject Matrix U tried to achieve in step dij,
E2. the court verdict C of two classification samples in each group is definedij, Cij=wij.I, I=+1, wherein -1, I=+1 tables
Sample is originally judged as first classification in the classification of two classes, and I=-1 is represented and is judged as another classification, by 6 groups of emotional semantic classifications point
Not Song Ru 6 graders make decisions output;
E3. calculated by associated translation, correlation computations formula is RT=CT.I6×4, wherein C is 6 output knots of grader
Fruit constitutes a column vector,It is four class emotions, six kinds of classification code word matrix of combination, wherein R=
{r1,r2,…rn};
E4. judge recognition result, use i*Represent the label of the emotional category that sample is identified, wherein i*=argmax
{ri}。
Beneficial effects of the present invention are to choose different features by different emotions, with improved adaptive fuzzy K averages
Clustering method emotions more all than traditional approach are well many with the recognition effect of the FCM methods of same feature, and discrimination
Higher, effect is more preferable.
Brief description of the drawings
Fig. 1 is speech emotion recognition flow chart of the invention;
Fig. 2 is judgement identification process figure of the invention.
Specific embodiment
With reference to the accompanying drawings and examples, technical scheme is described in detail:
Embodiment:
This example selects glad, angry, sad, the tranquil four classes emotion to carry out voice based on Berlin speech emotional storehouse (Emo-DB)
Emotion recognition;
As shown in figure 1, this example is comprised the following steps:
S1:Voice is pre-processed
Pretreatment includes preemphasis and adding window framing.
Preemphasis treatment:The purpose of preemphasis is that the frequency spectrum for making signal becomes flat, keeps from low to high whole
In frequency band, frequency spectrum can be sought with same signal to noise ratio, be analyzed in order to spectrum analysis or channel parameters.Preemphasis is usually to use one
Digital filter H (z) of rank=1- α z-1, wherein α is pre emphasis factor, and α takes 0.9 in this example.Primary speech signal S is by pre-
X (l) is obtained after aggravating filtering.
Framing:Voice is carried out into framing with the Hamming window that length is 23ms, N frames letter is obtained after one section of voice signal framing
Number, each frame is considered as a sample.
It is changed into x after signal x (l) windowing processnM (), formula is as follows:
xn(m)=w (m) x (n+m) 0≤m≤N-1 (1)
Hamming window:
S2:Feature extraction
This example extracts the deformation of Short Time Speech feature and correlated characteristic with voicebox, and voicebox is based on MATLAB languages
One speech processes tool box of speech.
The feature extracted includes mel cepstrum coefficients (MFCC), fundamental tone, formant, short-time energy.
Mel cepstrum coefficients (MFCC):Mel cepstrum coefficients are what the auditory properties based on human ear were proposed, and it is using a kind of
Nonlinear cps (Mel frequencies) simulate the auditory system of people.Experiment finds, in below 1000Hz, perception with
Frequency is linear, and more than 1000Hz, perception then with frequency into logarithmic relationship.So having difference to different frequencies
Perception, it is and especially sensitive to low frequency.Conversion formula between frequency f and MEL frequency is
In formula, f is frequency, unit:Hz.A frame voice signal takes 12 Jan Vermeer cepstrum coefficients in the present invention
Fundamental tone:Voice is that the different vibration of a series of frequencies for sending of sounding body, amplitude is composited.These shake
There is the minimum vibration of a frequency in dynamic, the sound sent by it is exactly fundamental tone, and multiple fundamental tones can be tried to achieve in a frame voice signal,
Thus can further obtain the fundamental tone minimum value and fundamental tone variance in a frame signal.
Formant:Sound when by resonant cavity, by the filter action of cavity so that the energy of different frequency in frequency domain
Redistribute, because the resonant interaction of resonant cavity is strengthened, another part is then attenuated, those for being strengthened for a part
Frequency shows as dense blackstreak on the sonagram of time frequency analysis.Because Energy distribution is uneven, strong part is just as mountain
Peak is general, so referred to as formant, can choose first common in a frame voice signal in the hope of multiple formants, in the present invention
Shake peak, the second formant, the 3rd formant, thus can further try to achieve the minimum value, most of formant in a frame signal
Big value and variance.
Short-time energy:So-called short-time energy is exactly the energy sum of a frame signal.Formula is
S3:Feature selecting
Four class emotions are used combination of two by this example, be obtained six groups of emotions to (it is glad/angry, it is glad/sad, it is glad/
Calmness, it is angry/sad, it is angry/tranquil, sad/tranquil), different combinations of features are then chosen as the defeated of each grader
Enter, specific features selection is as shown in table 1:
The best features group selection of table 1
As shown in figure 1, the different emotions of identification are to different characteristic informations, feature selecting 1 recognizes glad/angry emotion
Right, feature selecting 2 recognizes glad/sad emotion pair, and feature selecting 3 recognizes glad/tranquil emotion pair, the identification life of feature selecting 4
Gas/sadness emotion pair, feature selecting 5 recognizes angry/tranquil emotion pair, and feature selecting 6 recognizes sad/tranquil emotion pair.
In feature selecting, the feature that feature selecting 1 is chosen has MFCC, the first resonance peak maximum, the second formant maximum
Value, the 3rd resonance peak maximum, the 3rd resonance peak to average, the feature that feature selecting 2 is chosen have MFCC, the first formant minimum
Value, the 3rd resonance peak to average, fundamental tone minimum value, fundamental tone variance, the feature that feature selecting 3 is chosen have MFCC, the first formant side
Difference, the second resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value, the feature that feature selecting 4 is chosen have MFCC, first
Resonance peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy minimum value, the feature that feature selecting 5 is chosen have
MFCC, the first resonance peak maximum, the first formant variance, the second resonance peak maximum, the 3rd formant variance, feature selecting
6 features chosen have MFCC, the first formant variance, the second resonance peak maximum, the 3rd resonance peak to average, short-time energy minimum
Value.
Glad/angry one group of feature group is chosen in the present invention to illustrate, one section of voice by treatment above just
N frame signals, that is, N number of sample can be obtained, each sample will extract MFCC, the first resonance peak maximum, the second formant
The features such as maximum, the 3rd resonance peak maximum, the 3rd resonance peak to average, these features can constitute one 5 dimension row vector, institute
The matrix X of Nx5 is just obtained with one section of voice signal, other groups are by that analogy.
S4:Grader is selected
Speech emotional classification is classified with improved fuzzy K mean algorithms in the present invention.
In FCM (Fuzzy C-Means) algorithm, it is assumed that have a sample sequence collection X={ x1,x2,……,xN, then its
Object function is as follows,
Wherein, N represents number of samples;C represents cluster species number, and m is FUZZY WEIGHTED index, and he controls fog-level,
It is a critically important parameter, xkIt is some sample vector, viIt is the center of a certain class, is a cluster centre vector.
uikRepresent membership function value of k-th sample for the i-th class, and a certain sample to the membership function value of each cluster centre
Sum should be 1, i.e.,
U is subordinated-degree matrix, V={ v1,v2,...,vc},vi∈RnIt is the cluster centre vector for needing to be determined, A is c classes
Norm induce big matrix.
Formula (5) be an inner product apart from norm square.To obtain the minimum of formula (3) under conditions of (4) formula,
D is sought to viAnd uikPartially and equal to zero, can try to achieve
Initialization cluster centre vi, cluster centre quantity c and Fuzzy Weighting Exponent m are set, then by formula (6) and formula (7) weight
Multiple iteration reaches stabilization until being subordinate to angle value.FCM algorithms are, with the Euclidean distance criterion of standard, to refer to hypersphere and gather
Class, therefore the spherical cluster of similar dimension and density can only be detected because selection identical standard norm induced matrix be A=I or
Person A can be defined as the inverse of the covariance matrix of n × n:A=F-1, wherein F is as follows:
The sample average of data is represented, herein, A employs mahalanobis distance.
The cluster of different geometries is concentrated in a data in order to detect, the present invention is using self adaptation apart from norm to mark
Quasi-mode paste K mean algorithms are improved and obtain adaptive fuzzy K mean algorithms, and each cluster uses the induced matrix of oneself
Ai, this will produce following inner product norm:
Matrix AiAs optimized variable in K mean value functions, so that the self adaptation distance function of each class is to data set
Local topology it is optimal.The object function of self adaptation FCM algorithms is defined as follows:
Object function can not be relative to AiDirectly minimize, because AiIt is linear, that is to say, that as long as AiPositive fixed number get over
Few object function is smaller.In order to obtain a feasible solution, it is necessary to AiBe any limitation as, typically to AiDeterminant is restricted,
Allow matrix AiDeterminant is fixed as certain value makes cluster shape optimal while keeping locality set size constant.||Ai| |=ρi,
ρ > 0, ρiAll it is fixed value for each class, then with method of Lagrange multipliers, solves AiIt is as follows:
Wherein FiIt is the fuzzy covariance matrix of the i-th class, is defined as follows:
Formula (10) will be brought into x will be obtained in formula (11), (12) abovekTo cluster average vkGeneral Quadratic mahalanobis distance
Norm, it can be seen that be to be weighting in this covariance with subordinated-degree matrix U.
It is in the present invention the purpose for reaching classification, it is necessary to make object functionMinimum,
Wherein X is the characteristic information matrix that step S3 is obtained, and U is subordinated-degree matrix, and V is cluster centre matrix, and A is that the norm of 4 classes is lured
Big matrix is led, N is characterized information number, i.e. sample number, and c=4 is cluster species number, and m=1 is FUZZY WEIGHTED index, uikRepresent
K-th sample for the i-th emotion class membership function value, viIt is the center of a certain emotion class, is a cluster centre arrow
Amount, xkIt is the characteristic information vector of a certain sample, is calculated by loop iteration, is exactly target letter when Subject Matrix stabilization
When number is minimum, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be randomly selected, | | Ai| |=ρi, ρ >
0,.Specific circulation step is as follows:
The first step calculates cluster centre,
Second step calculates cluster covariance matrix,
3rd step calculates mahalanobis distance,
Wherein
4th step updates subordinated-degree matrix,
L is the iterations of circulation, and loop stop conditions are | | U(l)-U(l-1)||≤ε;Each group of feature is believed respectively
Breath X is processed by above-mentioned execution step can just obtain each group of stable Subject Matrix U.Such 6 groups of characteristic informations are by 6
The fuzzy clustering treatment of two graders, exports 6 Subject Matrix of stabilization.
S5:Judgement identification
In the judgement of this paper multi-categorizers, the output confidence level first to each sub-classifier is evaluated, Ran Houtong
Cross related operation to make decisions, calculate final recognition result.Analysis, two classes are identified by single sample in the present invention
U in the subordinated-degree matrix U of self adaptation FCM outputsikAnd ujkRepresent that k samples belong to the degree of i classes and j classes.When two degrees of membership
Difference it is bigger, illustrate identification correctness it is bigger.W can be usedijAs the confidence level of each binary classifier.Formula can be used
(13) obtain.
When grader judgement is more reliable, difference is bigger, wijIt is bigger, on the contrary work as wijMore hour, illustrate that sample distance is overlapped
Region is nearer, and classification reliability is poorer.The confidence level w of grader is obtainedij, accordingly as blending weight by the output of grader
It is defined as,
Cij=wij.I, I=+1, -1, (14)
Wherein I is the judgement of two classes classification, and we make I=+1 represent first classification being judged as in the classification of two classes, I
=-1 expression is judged as another classification.
In order to make decisions, the output of this 6 classifiers is constituted into a super vector below, with the method for associated translation come
Make decisions.In ideal conditions, judgement confidence level wijIt is 1, the output valve C for now obtainingij=I, when sample to be identified not
When belonging to two classifications that binary classifier can be recognized, the information that output valve is given is not biased towards, in any one classification, being set to
Zero.The output valve obtained using this ideal situation as current class code word, as shown in table 2.In practical situations both, output valve Cij
=wij.I it is centered around around ideal value (code word), row decoding can be entered according to the distance of real output value and code word.Associated translation
The effect of device is that the degree of closeness between actual value and ideal value is weighed by related operation, and maximum correlation is corresponding
Emotional category, as recognition result,
i*=argmax { ri, (15)
i*The label of the emotional category that expression is identified, riIt is correlation, is obtained by formula (16),
RT=CT.I6×4, (16)
Wherein, R={ r1,r2,…rn}.C is the sextuple column vector that each grader output valve is constituted, I6×4It is four class emotions
The classification code word matrix of six kinds of combinations is as shown in table 2.
The code word of the emotional category of table 2
Grader | It is glad | It is angry | It is sad | It is tranquil |
It is glad/angry | 1 | -1 | 0 | 0 |
It is glad/sad | 1 | 0 | -1 | 0 |
It is glad/tranquil | 1 | 0 | 0 | -1 |
It is angry/sad | 0 | 1 | -1 | 0 |
It is angry/tranquil | 0 | 1 | 0 | -1 |
It is sad/tranquil | 0 | 0 | 1 | -1 |
In this example, the N frame signals that one section of voice is obtained after treatment, as N number of sample, this N number of sample passes through two
Class grader cluster output Subject Matrix U is the matrix of Nx2, in identification is adjudicated, judges the classification of each sample, is finally united
Count the correct recognition rata of this N number of sample.
This example can be identified to glad, angry, sad, tranquil four classes speech emotional.
In order to speech-emotion recognition method increases relative to the performance of FCM methods in verifying the present invention, 2 groups have been carried out
Contrast test.In battery of tests, feature is first selected, two graders are made into FCM to recognize selected four classes emotion.Second group of use
The selected four class emotions of method identification in the present invention.
For the sample set trained, glad, angry, sad, tranquil four classes emotion is selected respectively to select from Emo-DB corpus
30 sentence is tested.
Two groups of recognition results are as shown in table 3:
The comparison-of-pair sorting's experimental result of table 3
From the interpretation of result of table 3 can be seen that set forth herein method totally improve than FCM discrimination a lot, wherein right
Sad recognition performance improves 0.03, and 0.13 is improve to tranquil recognition performance.
The present invention chooses the speech emotional adaptive fuzzy clustering identification of different characteristic based on different emotions from the above,
Improve voice-based emotion recognition rate.
Claims (2)
1. it is a kind of based on voice fuzzy cluster emotion identification method, it is characterised in that comprise the following steps:
A. the voice signal being input into is pre-processed;The pretreatment includes preemphasis filtering and adding window framing, and voice is believed
Number it is divided into N frames, wherein N is the positive integer more than 1;
B. the characteristic information of the voice signal after extraction process;The characteristic information includes mel cepstrum coefficients, fundamental tone, formant
And short-time energy;In the characteristic information of extraction, the fundamental tone includes fundamental tone variance, fundamental tone minimum value;The formant includes the
One resonance peak maximum, the first formant minimum value, the first resonance peak to average;Second resonance peak maximum, the second formant are equal
Value;3rd resonance peak maximum, the 3rd resonance peak to average, the 3rd formant variance;The short-time energy is minimum short-time energy
Value
C. multiple graders are input into after being combined voice signal and characteristic information carries out classification treatment;The grader is at least
The emotional category included comprising 2 kinds of emotional categories and each grader is incomplete same;The voice signal enters with characteristic information
The concrete mode of row combination is that, according to the emotional category that the grader that will be input into is included, voice signal chooses different spies
Reference constitutive characteristic information vector X, the wherein row vector of X are the characteristic information that each frame voice signal is chosen, its column vector
It is frame number N;The specific grader is 6, and each grader includes 2 kinds of totally 4 class emotional categories, respectively glad, raw
Gas, sadness and calmness, are grouped using the method for dividision into groups two-by-two, are divided into six groups, and first group is glad/sad for glad/anger, second group
Wound, the 3rd group for it is glad/tranquil, the 4th group for it is angry/sad, the 5th group for it is angry/tranquil, the 6th group be sad/tranquil;Often
One group of emotional category one grader of correspondence;Each group of extraction makes the characteristic information number that two class emotions in the group are optimal, so
It is afterwards from a frame voice by the row vector of each group of characteristic information composition characteristic information sequence collection X, wherein characteristic information vector X
Obtained in signal, the size of row is one section of frame number of voice;Wherein, each group of characteristic information is specifically, first group of spy of extraction
Reference breath is mel cepstrum coefficients, the first resonance peak maximum, the second resonance peak maximum, the 3rd resonance peak maximum, the 3rd
Resonance peak to average;Second group of characteristic information of extraction is equal mel cepstrum coefficients, the first formant minimum value, the 3rd formant
Value, fundamental tone minimum value, fundamental tone variance;The 3rd group of characteristic information of extraction is mel cepstrum coefficients, the first formant variance, second
Resonance peak to average, the 3rd resonance peak maximum, fundamental tone minimum value;4th group of characteristic information of extraction is mel cepstrum coefficients, the
One resonance peak maximum, the 3rd resonance peak maximum, fundamental tone average, short-time energy minimum value;The 5th group of characteristic information of extraction
It is mel cepstrum coefficients, the first resonance peak maximum, the first formant variance, the second resonance peak maximum, the 3rd formant side
Difference;6th group of characteristic information of extraction is mel cepstrum coefficients, the first formant variance, the second resonance peak maximum, the 3rd is total to
Shake peak to average, short-time energy minimum value
D. classification treatment is carried out to each grader respectively, show that voice signal is subordinate to emotional category in the classifiers
Degree;Specific sorting technique is using adaptive fuzzy K mean algorithms;The use adaptive fuzzy K mean algorithms are divided
The specific method of class is:
The object function of adaptive fuzzy K mean algorithms is defined as:Wherein,ForX is characterized information sequence collection, and U is degree of membership square
Battle array, V is cluster centre matrix, and A induces big matrix, N to be characterized information number, i.e. sample number for the norm of c classes, and c is cluster kind
Class number, m is FUZZY WEIGHTED index, uijRepresent membership function value of j-th sample for the i-th emotion class, viIt is a certain emotion
The center of class, is a cluster centre vector, xkIt is a certain characteristic information vector, AiFor the local norm of a certain class is induced
Matrix;To reach the purpose of classification, it is necessary to make object function J minimum, calculated by loop iteration, when Subject Matrix stabilization
When time is exactly minimum object function, it is ε to set the fault-tolerant thresholding of Subject Matrix, and initial Subject Matrix can be randomly selected;
The loop iteration is calculated and comprised the following steps:
The first step:Calculate cluster centre
Second step:Calculate cluster covariance matrix
3rd step:Calculate mahalanobis distance,Wherein,
Ai=[ρi det(Fi)]1/nFi -1, | | Ai| |=ρi,ρ>0,ρiIt is the local clustering parameter of control;
4th step:Update subordinated-degree matrix,L is the iterations of circulation;
Loop stop conditions are | | U(l)-U(l-1)||≤ε;Respectively by each group of characteristic information X by above-mentioned loop iteration calculate into
Row treatment obtains each group of stable Subject Matrix U
E. the degree of membership result for being exported according to each grader carries out speech emotion recognition;Specific recognition methods is will be all
Output result constitutes super vector, enters after row decoding to export the recognition result of judgement to super vector.
2. it is according to claim 1 it is a kind of based on voice fuzzy cluster emotion identification method, it is characterised in that step e
In specific method be:
E1. the confidence level w of sample in each group is sought according to the Subject Matrix U tried to achieve in step dij,
E2. the court verdict C of two classification samples in each group is definedij, Cij=wij.I, I=+1, -1, wherein I=+1 represents sample
Originally it is judged as a classification in the classification of two classes, I=-1 is represented and is judged as another classification, 6 groups of emotional semantic classifications are respectively fed to
6 graders make decisions output;
E3. calculated by associated translation, correlation computations formula is RT=CT.I6x4, wherein C is 6 output result groups of grader
Into a column vector,It is four class emotions, six kinds of classification code word matrix of combination, wherein R={ r1,
r2,…rn};
E4. judge recognition result, use i*Represent the label of the emotional category that sample is identified, wherein i*=arg max { ri}。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410299493.3A CN104077598B (en) | 2014-06-27 | 2014-06-27 | A kind of emotion identification method based on voice fuzzy cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410299493.3A CN104077598B (en) | 2014-06-27 | 2014-06-27 | A kind of emotion identification method based on voice fuzzy cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104077598A CN104077598A (en) | 2014-10-01 |
CN104077598B true CN104077598B (en) | 2017-05-31 |
Family
ID=51598844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410299493.3A Expired - Fee Related CN104077598B (en) | 2014-06-27 | 2014-06-27 | A kind of emotion identification method based on voice fuzzy cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104077598B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650576A (en) * | 2016-09-22 | 2017-05-10 | 中国矿业大学 | Mining equipment health state judgment method based on noise characteristic statistic |
CN107886056A (en) * | 2017-10-27 | 2018-04-06 | 江苏大学 | A kind of electronic nose of fuzzy covariance learning network differentiates vinegar kind method |
CN108122552B (en) * | 2017-12-15 | 2021-10-15 | 上海智臻智能网络科技股份有限公司 | Voice emotion recognition method and device |
EP3729419A1 (en) * | 2017-12-19 | 2020-10-28 | Wonder Group Technologies Ltd. | Method and apparatus for emotion recognition from speech |
CN109065071B (en) * | 2018-08-31 | 2021-05-14 | 电子科技大学 | Song clustering method based on iterative k-means algorithm |
CN111898690B (en) * | 2020-08-05 | 2022-11-18 | 山东大学 | Power transformer fault classification method and system |
CN113611326B (en) * | 2021-08-26 | 2023-05-12 | 中国地质大学(武汉) | Real-time voice emotion recognition method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
CN101620853A (en) * | 2008-07-01 | 2010-01-06 | 邹采荣 | Speech-emotion recognition method based on improved fuzzy vector quantization |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100631786B1 (en) * | 2005-02-18 | 2006-10-12 | 삼성전자주식회사 | Method and apparatus for speech recognition by measuring frame's confidence |
-
2014
- 2014-06-27 CN CN201410299493.3A patent/CN104077598B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101620853A (en) * | 2008-07-01 | 2010-01-06 | 邹采荣 | Speech-emotion recognition method based on improved fuzzy vector quantization |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
Also Published As
Publication number | Publication date |
---|---|
CN104077598A (en) | 2014-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104077598B (en) | A kind of emotion identification method based on voice fuzzy cluster | |
US7362892B2 (en) | Self-optimizing classifier | |
Kamaruddin et al. | Cultural dependency analysis for understanding speech emotion | |
CN110289003A (en) | A kind of method of Application on Voiceprint Recognition, the method for model training and server | |
CN107291822A (en) | The problem of based on deep learning disaggregated model training method, sorting technique and device | |
CN105261367B (en) | A kind of method for distinguishing speek person | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
CN104035996B (en) | Field concept abstracting method based on Deep Learning | |
CN107122352A (en) | A kind of method of the extracting keywords based on K MEANS, WORD2VEC | |
CN105609116B (en) | A kind of automatic identifying method in speech emotional dimension region | |
CN102201237B (en) | Emotional speaker identification method based on reliability detection of fuzzy support vector machine | |
CN110046252A (en) | A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping | |
CN107180084A (en) | Word library updating method and device | |
CN112905739B (en) | False comment detection model training method, detection method and electronic equipment | |
CN111899766B (en) | Speech emotion recognition method based on optimization fusion of depth features and acoustic features | |
CN112437053B (en) | Intrusion detection method and device | |
CN110085216A (en) | A kind of vagitus detection method and device | |
CN103631753A (en) | Progressively-decreased subspace ensemble learning algorithm | |
Phan et al. | Multi-view audio and music classification | |
Fan et al. | Modeling voice pathology detection using imbalanced learning | |
CN112466284B (en) | Mask voice identification method | |
Kamaruddin et al. | Features extraction for speech emotion | |
Bassiou et al. | Greek folk music classification into two genres using lyrics and audio via canonical correlation analysis | |
Jeyakarthic et al. | Optimal bidirectional long short term memory based sentiment analysis with sarcasm detection and classification on twitter data | |
CN106448660B (en) | It is a kind of introduce big data analysis natural language smeared out boundary determine method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170531 Termination date: 20180627 |
|
CF01 | Termination of patent right due to non-payment of annual fee |