CN101620853A - Speech-emotion recognition method based on improved fuzzy vector quantization - Google Patents

Speech-emotion recognition method based on improved fuzzy vector quantization Download PDF

Info

Publication number
CN101620853A
CN101620853A CN200810122806A CN200810122806A CN101620853A CN 101620853 A CN101620853 A CN 101620853A CN 200810122806 A CN200810122806 A CN 200810122806A CN 200810122806 A CN200810122806 A CN 200810122806A CN 101620853 A CN101620853 A CN 101620853A
Authority
CN
China
Prior art keywords
parameter
average
sigma
formula
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810122806A
Other languages
Chinese (zh)
Inventor
邹采荣
赵力
赵艳
魏昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN200810122806A priority Critical patent/CN101620853A/en
Publication of CN101620853A publication Critical patent/CN101620853A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a speech-emotion recognition method based on improved fuzzy vector quantization. The method extends the sum of fuzzy membership function from one to N so as to reduce the influence of sample wild-point on an iteration-training process to a certain extent, and adopts a clustering method based on similarity threshold and a minimum distance principle in the iteration-training process so as to avoid the problem that a clustering center is sensitive to initial values and easy to fall into local minimum values to a certain extent. Experimental results prove that the method can effectively improve the emotion recognition rate of the prior fuzzy vector quantization method.

Description

A kind of based on the speech-emotion recognition method that improves fuzzy vector quantization
Technical field
The present invention relates to a kind of audio recognition method, particularly a kind of speech emotional recognition system and method.
Background technology
The speech emotional automatic identification technology mainly comprises two problems: the one, and which kind of feature in the employing voice signal is as emotion recognition, and the problem of affective characteristics extraction just comprises feature extraction and selection; The one, how with specific voice data classification, the problem of pattern-recognition just comprises various algorithm for pattern recognitions, as arest neighbors, neural network, support vector machine etc.
The affective characteristics of using in the speech emotional identification mainly is prosodic parameter and tonequality parameter, and the former comprises duration, word speed, energy, fundamental frequency and the parameter of deriving thereof, and the latter mainly is resonance peak, harmonic noise ratio and the parameter etc. of deriving thereof.According to three-dimensional emotional space theory, prosodic parameter mainly is to characterize various emotions in the parameter that activates dimension coordinate, and the tonequality parameter then mainly is to characterize emotion at the coordinate of dimension of tiring.For activating dimension coordinate distance emotion far away, prosodic parameter can characterize out otherness preferably; For the dimension coordinate distance emotion far away of tiring activating the dimension coordinate close together, then need tonequality class parameter to strengthen the characterization parameter otherness.There is the problem that accurately detects mostly in present parameter extracting method, and these parameters mainly embody is the glottis of human body and the feature of sound channel, with people's physiological make-up confidential relation is arranged, show stronger diversity on different individualities, this species diversity is especially obvious on different sexes.Before the present invention, in existing various recognition methodss, though the neural network method has the non-linear and extremely strong classification capacity of height, along with the required learning time increase of the increase of network is very fast, the local minimum problem also is a weak point in addition; Hidden Markov method (HMM) is longer on foundation and training time, and being applied to reality also needs to solve the too high problem of computation complexity.Though quadratic discriminant algorithm simple computation amount is little, must be prerequisite with the eigenvector Normal Distribution, influenced discrimination greatly.Then because problem such as quantization error, initial value sensitivity and less use,, but still easily be absorbed in the problem of the responsive and local minimum of initial value based on the recognition methods of vector quantization though fuzzy vector quantization has been alleviated the quantization error problem to a certain extent.
Summary of the invention
Purpose of the present invention just is to overcome the defective of above-mentioned prior art, design, a kind of speech-emotion recognition method based on the improvement fuzzy vector quantization of research.
Technical scheme of the present invention is:
A kind of based on the speech-emotion recognition method that improves fuzzy vector quantization, the steps include:
Set up training, the emotion recognition module of feature extraction analysis module, feature dimensionality reduction module, improvement fuzzy vector quantization module.The feature extraction analysis module comprises that two class Parameter Extraction and sex are regular: prosodic parameter and tonequality parameter.At first, carry out feature extraction then respectively to primary speech signal pre-emphasis, branch frame.
(1) prosodic parameter extracts
(1-1) with primary speech signal through the Hi-pass filter pre-service, extract pronunciation duration, word speed parameter;
(1-2) divide frame, windowing;
(1-3) use the short-time analysis technology, extract each frame statement principal character parameter respectively: fundamental frequency track, short-time energy track, voiced segments voiceless sound section time ratio;
(1-4) parameter of deriving of extraction part prosodic features parameter: short-time energy maximal value, minimum value, average and variance, short-time energy shake maximal value, minimum value, average and variance, fundamental frequency maximal value, minimum value, average and variance, maximal value, minimum value, average and the variance of fundamental frequency shake.What wherein short-time energy was shaken is calculated as follows:
E i 1 = | E i 0 - E i - 1 0 | I=2,3 ..., N (formula 1)
E wherein t 0Be the short-time energy of i frame, N is a frame number.The calculating of fundamental frequency shake is with (formula 1).
(1-5) sex is regular, according to the different sexes under the sample, is included into different set s iCalculate average μ separately once more respectively iAnd variances sigma i, represent different set numbers with i here, utilize following formula that parameter is regular to identical space;
s i ′ = s i - u i σ i (formula 2)
(2) tonequality characteristic parameter extraction
(2-1) maximal value, minimum value, average and the variance of extraction glottis wave parameter, comprise: glottis opening time and whole glottis period ratio (OQ, open quotient), glottis opening process time and closing course time ratio (SQ, speed quotient), glottis closure time and whole glottis period ratio (CQ, ClosedQuotient), glottis closing course time and whole glottis period ratio (ClQ, Closing Quotient), glottis ripple skewness;
(2-2) extract harmonic noise than maximal value, minimum value, average, variance;
(2-3) extract first three resonance peak maximal value, minimum value, average, variance and bandwidth;
(2-4) extract maximal value, minimum value, average, the variance that first three resonance peak is shaken; The resonance peak Jitter Calculation is with (formula 1);
(2-5) sex is regular, with (1-5);
(3) feature dimensionality reduction
(3-1) with behind whole feature extractions and regular the finishing in (1) (2), the composition characteristic vector;
(3-2) adopt principal component analysis neural network (PCANN) to realize dimensionality reduction, obtain sample characteristics vector sequence X={X 1, X 2..., X N, };
(4) improve fuzzy vector quantization
(4-1) to all training samples of certain emotion, calculate the Euclidean distance between any two samples, two nearest samples are decided to be a class, selected distance threshold values L will be judged to this type of with distance all samples within L of one of this two sample;
(4-2) sample and the distance relevant with these samples that have the classification ownership are suitably handled, do not re-used;
(4-3) in remaining sample, find nearest pair of sample, if the distance between them then is decided to be a class respectively with these two samples, and has only a sample in all kinds of greater than L; If the distance between them is less than L, selected distance threshold values α L (0<α≤1) then will declare with distance all samples within α L of one of this sample and belong to this type of;
(4-4) repeating step (4-2), (4-3) are classified up to all samples, if last only surplus sample then is decided to be a class separately with this sample;
(4-5) adjust L and α L, gathered into the J class up to all samples;
(4-6) with membership function u k(X i) normalizing condition expand as Σ j = 1 J Σ i = 1 N u j ( X i ) = N , Calculate u by (formula 3) k(X i), calculate all kinds of class center Y by (formula 4) j(i=1,2 ... J);
u k ( X i ) = Σ j = 1 J Σ i = 1 N ( d ( X i , Y k ) 2 / ( m - 1 ) Nd ( X i , Y j ) 2 / ( m - 1 ) ) - 1 , 1≤k≤J, 1≤i≤N (formula 3)
Y k = Σ i = 1 N u k m ( X i ) X i Σ i = 1 N u k m ( X i ) 1≤k≤J (formula 4)
Wherein m ∈ [1, ∞) be blur level, d (X i, Y k) the expression distance;
(4-7) selectivity constant ε>0 is provided with iterations k=0, as initial codebook, adopts fuzzy C average (FCM) clustering algorithm recursion to go out code book Y with the class center of (4-6) j(i=1,2 ... J);
(4-8) every kind of emotion is trained a code book by (4-1)~(4-7);
(5) emotion recognition
(5-1) obtain eigenvector X according to step (1) (2) (3) for statement to be identified i, X iBe quantized into the vector U (X that forms by membership function i)={ u 1, (X i), u 2(X i) ..., u J(X i), obtain X iReconstructed vector
Figure S2008101228062D00033
With quantization error D;
X ^ i = Σ k = 1 J u k m Y k / Σ k = 1 J u k m (formula 5)
D = Σ k = 1 J u k m ( X i ) d ( X i , Y k ) (formula 6)
(5-2) selecting the emotion of that code book correspondence of average quantization distortion minimum is recognition result.
Advantage of the present invention and effect are:
1. by characteristic parameter extraction and analysis, parameter is extended to the tonequality parameter from prosodic parameter, increased the validity of characteristic parameter identification to the emotion statement;
2. adopt the isolated component neural network that the eigenvector that is extracted is carried out dimensionality reduction, not only reduced calculated amount, and played the noise reduction effect to a certain extent;
3. the fuzzy membership function normalizing condition is relaxed, reduce of the influence of wild point code book;
4. adopt clustering method training code book, avoided initial value and local minimum problem based on similarity threshold values and minimal distance principle;
By vector quantization input vector X iBe quantized into the vector of forming by membership function, rather than certain code word Y k, be equivalent to increase the size of code book, reduced quantization error.
Other advantages of the present invention and effect will continue to describe below.
Description of drawings
Fig. 1---speech emotional recognition system block diagram.
Fig. 2---affective characteristics extraction and analysis module process flow diagram.
Fig. 3---glottis involves its differentiated waveform figure.
Fig. 4---principal component analysis neural network synoptic diagram.
Fig. 5---the emotion recognition result of fuzzy vector quantization method compares before and after improving.
Embodiment
Below in conjunction with drawings and Examples, technical solutions according to the invention are further elaborated.
As shown in Figure 1, be the native system block diagram, mainly be divided into 4 bulks: feature extraction analysis module, feature dimensionality reduction module, fuzzy vector quantization code book training module and emotion recognition module.The total system implementation is divided into training process and identifying.Training process comprises feature extraction analysis, feature dimensionality reduction and the training of fuzzy vector quantization code book; Identifying comprises feature extraction analysis, feature dimensionality reduction and emotion recognition.
One. affective characteristics extraction and analysis module
1. the prosodic features parameter is selected
The prosodic features parameter comprises: short-time energy maximal value, minimum value, average and variance; Short-time energy shake maximal value, minimum value, average and variance; The maximal value of fundamental frequency, minimum value, average and variance; Maximal value, minimum value, average and the variance of fundamental frequency shake; Voiced segments voiceless sound section time ratio; Word speed.
At first, the characteristic parameter extraction flow process in 2 is carried out pre-emphasis with feature statement to be extracted and is handled with reference to the accompanying drawings, comprises that high-pass filtering, statement begin the detection of end points and end caps; Extract statement pronunciation duration, these two features of word speed of full sentence; Divide the frame windowing to statement then, adopt the short-time analysis technology, according to the gender, obtain each frame fundamental frequency, short-time energy, voiced sound frame number and voiceless sound frame number respectively, each frame gained parameter is gathered, obtain pitch contour, fundamental tone shake track, short-time energy track and the short-time energy shake track of statement respectively, and then obtain their characteristic statistic, and it is regular to carry out sex, obtains above-mentioned whole prosodic features parameter.
2. the tonequality characteristic parameter is selected
The tonequality characteristic parameter comprises: the maximal value of OQ, minimum value, average and variance; The maximal value of SQ, minimum value, average and variance; The maximal value of CQ, minimum value, average and variance; The maximal value of ClQ, minimum value, average and variance; R kMaximal value, minimum value, average and variance; The first resonance peak maximal value, minimum value, average, variance and bandwidth; Maximal value, minimum value, average, the variance of the shake of first resonance peak; The second resonance peak maximal value, minimum value, average, variance and bandwidth; Maximal value, minimum value, average, the variance of the shake of second resonance peak; The 3rd resonance peak maximal value, minimum value, average, variance and bandwidth; Maximal value, minimum value, average, the variance of the shake of the 3rd resonance peak; Harmonic noise is than maximal value, minimum value, average, variance.
Choosing of a plurality of tonequality parameters is that the present invention proposes one of characteristics of method.Though prosodic features plays a leading role in identification, some activates when tieing up the emotion of separating near the dimension of tiring in identification, and as glad and angry, the tonequality feature can play effective supplementary function.The variation of glottal waveform shape when the tonequality parameter is the reflection pronunciation, its influence factor has muscle tone, sound channel central authorities' pressure and sound channel length tension force, concrete sound Source Type (articulation type), glottis wave parameter and sound channel formant parameter etc.LF model (Liljencrants-Fant Mode) is the model of the description glottis ripple used always, as shown in Figure 3, and T wherein 0: pitch period; t o: glottis is opened constantly; t c: glottis closing moment; t p: the glottis ripple reaches peak-peak constantly; t e: the difference ripple reaches maximum negative peak constantly.Can extract following glottis wave parameter according to this model:
OQ = t c - t o T 0 (formula 7)
SQ = t p - t o t c - t p (formula 8)
CQ = T 0 - t c + t o T 0 = 1 - OQ (formula 9)
CIQ = t c - t p T 0 (formula 10)
R k = t p t e - t p (formula 11)
During concrete enforcement, need that still the emotion statement is carried out pre-emphasis and handle, comprise that high-pass filtering, statement begin the detection of end points and end caps; Divide the frame windowing to statement then, obtain tonequality parameters such as glottis wave characteristic, resonance peak feature, harmonic noise compare respectively, and it is regular to carry out sex, finally be used for the tonequality characteristic parameter of code book training or identification.
In the implementation of system, the feature extraction analysis is absolutely necessary.In training process, the feature extraction analysis of training sample can directly be carried out according to flow process shown in Figure 2.In identifying, the feature extraction of statement to be identified is analyzed and is carried out according to Fig. 2 flow process equally.
Two. the feature dimensionality reduction
Preceding surface analysis has extracted totally 69 characteristic parameters, for avoiding the too high computation complexity that causes of dimension to promote, and redundant information adopts the isolated component neural network to realize dimensionality reduction to the influence of identification, employing is based on the linear unsupervised learning neural network of Hebb rule, as shown in Figure 4.By study to weight matrix W, make weight vector approach the pairing proper vector of eigenwert in the oblique variance battle array of proper vector x, avoid directly inversion operation to matrix.Obtain eigenvector y=W behind the dimensionality reduction TX.It is as follows that weight vector is revised rule:
w j[k+1]=w j[k]+η (y j[k] x ' [k]-y j 2[k] w j[k]) (formula 12)
x ′ [ k ] = x [ k ] - Σ i = 1 j - 1 w i [ k ] y i [ k ] (formula 13)
Three. improve the training of fuzzy vector quantization code book
The traditional fuzzy vector quantization is to adopt the fuzzy clustering algorithm to replace the design of K mean algorithm to quantize a kind of method of code book, can reduce the quantization error of code book to a certain extent, wild point disturbs, initial value is responsive and local minimization problem but still exist, for this reason, the present invention proposes a kind of improvement fuzzy vector quantization method, and concrete steps are as follows:
1. to all training characteristics samples of a certain emotion, calculate the Euclidean distance between any two samples, two nearest samples are decided to be a class, selected distance threshold values L will be judged to this type of with distance all samples within L of one of this two sample;
2. the sample and the distance relevant with these samples that have the classification ownership are suitably handled, do not re-used;
3. in remaining sample, find nearest pair of sample, if the distance between them then is decided to be a class respectively with these two samples, and has only a sample in all kinds of greater than L; If the distance between them is less than L, selected distance threshold values α L (0<α≤1) then will declare with distance all samples within α L of one of this sample and belong to this type of;
4. repeating step 2,3, all are classified up to all samples, if last only surplus sample then is decided to be a class separately with this sample;
5. adjust L and α L, gathered into the J class up to all samples;
6. calculate membership function u according to (formula 3) k(X i), with u k(X i) normalizing condition expand as Σ j = 1 J Σ i = 1 N u j ( X i ) = N , This also is one of characteristics of the present invention, and calculates all kinds of class center Y by (formula 4) j(i=1,2 ... J);
7. selectivity constant ε>0 is provided with iterations k=0, as initial codebook, adopts fuzzy C mean algorithm recursion to go out code book Y with result in 6 j(i=1,2 ... J);
8. every kind of emotion is trained a code book respectively by 1~7.
Four. the emotion recognition module
Emotion statement for to be identified extracts its eigenvector according to Fig. 2 flow process, utilizes the principal component analysis neural network to carry out dimensionality reduction then, obtains X iWith X iThe code book of corresponding every kind of emotion carries out vector quantization, X iBe quantized into the vector U (X that forms by membership function i)={ u 1, (X i), u 2(X i) ..., u J(X i), obtain X iReconstructed vector
Figure S2008101228062D00062
With quantization error D; Selecting the emotion of that code book correspondence of average quantization distortion minimum is recognition result.
Five. the evaluation of recognition system
Because it is N that the degree of membership summation is expanded by 1, reduced the influence of sample wild-point to a certain extent to the training iterative process, in code book instruction amount process, adopt clustering method based on similarity threshold values and minimal distance principle, avoided to a certain extent cluster centre to the initial value sensitivity, easily be absorbed in the problem of local minimum, result from two kinds of emotion identification methods of Fig. 5, its recognition effect obtains bigger improvement, angry discrimination has improved 12.3%, sad discrimination has improved 5.1%, glad discrimination has improved 5.9%, surprised discrimination has improved 14.9%, and the inventive method is discerned speech emotional and is much higher than existing additive method.
The scope that the present invention asks for protection is not limited only to the description of this embodiment.

Claims (1)

1. the speech-emotion recognition method based on the improvement fuzzy vector quantization the steps include:
Set up training, the emotion recognition module of feature extraction analysis module, feature dimensionality reduction module, improvement fuzzy vector quantization module; The feature extraction analysis module comprises that two class Parameter Extraction and sex are regular: prosodic parameter and tonequality parameter; At first, carry out feature extraction then respectively to primary speech signal pre-emphasis, branch frame;
(1) prosodic parameter extracts
(1-1) with primary speech signal through the Hi-pass filter pre-service, extract pronunciation duration, word speed parameter;
(1-2) divide frame, windowing;
(1-3) use the short-time analysis technology, extract each frame statement principal character parameter respectively: fundamental frequency track, short-time energy track, voiced segments voiceless sound section time ratio;
(1-4) parameter of deriving of extraction part prosodic features parameter: short-time energy maximal value, minimum value, average and variance, short-time energy shake maximal value, minimum value, average and variance, fundamental frequency maximal value, minimum value, average and variance, maximal value, minimum value, average and the variance of fundamental frequency shake; What wherein short-time energy was shaken is calculated as follows:
E i 1 = | E i 0 - E i - 1 0 | I=2,3 ..., N (formula 1)
E wherein i 0Be the short-time energy of i frame, N is a frame number; The calculating of fundamental frequency shake is with (formula 1);
(1-5) sex is regular, according to the different sexes under the sample, is included into different set s iCalculate average μ separately once more respectively iAnd variances sigma i, represent different set numbers with i here, utilize following formula that parameter is regular to identical space;
s i ′ = s i - u i σ i (formula 2)
(2) tonequality characteristic parameter extraction
(2-1) maximal value, minimum value, average and the variance of extraction glottis wave parameter, comprise: glottis opening time and whole glottis period ratio (OQ, open quotient), glottis opening process time and closing course time ratio (SQ, speed quotient), glottis closure time and whole glottis period ratio (CQ, ClosedQuotient), glottis closing course time and whole glottis period ratio (ClQ, Closing Quotient), glottis ripple skewness;
(2-2) extract harmonic noise than maximal value, minimum value, average, variance;
(2-3) extract first three resonance peak maximal value, minimum value, average, variance and bandwidth;
(2-4) extract maximal value, minimum value, average, the variance that first three resonance peak is shaken; The resonance peak Jitter Calculation is with (formula 1);
(2-5) sex is regular, with (1-5);
(3) feature dimensionality reduction
(3-1) with behind whole feature extractions and regular the finishing in (1) (2), the composition characteristic vector;
(3-2) adopt principal component analysis neural network (PCANN) to realize dimensionality reduction, obtain sample characteristics vector sequence X={X 1, X 2..., X N, };
(4) improve fuzzy vector quantization
(4-1) to all training samples of certain emotion, calculate the Euclidean distance between any two samples, two nearest samples are decided to be a class, selected distance threshold values L will be judged to this type of with distance all samples within L of one of this two sample;
(4-2) sample and the distance relevant with these samples that have the classification ownership are suitably handled, do not re-used;
(4-3) in remaining sample, find nearest pair of sample, if the distance between them then is decided to be a class respectively with these two samples, and has only a sample in all kinds of greater than L; If the distance between them is less than L, selected distance threshold values α L (0<α≤1) then will declare with distance all samples within α L of one of this sample and belong to this type of;
(4-4) repeating step (4-2), (4-3) are classified up to all samples, if last only surplus sample then is decided to be a class separately with this sample;
(4-5) adjust L and α L, gathered into the J class up to all samples;
(4-6) with membership function u k(X i) normalizing condition expand as Σ j = 1 J Σ i = 1 N u j ( X i ) = N , Calculate u by (formula 3) k(X i),
Calculate all kinds of class center Y by (formula 4) j(i=1,2 ... J);
u k ( X i ) = Σ j = 1 J Σ i = 1 N ( d ( X i , Y k ) 2 / ( m - 1 ) Nd ( X i , Y j ) 2 / ( m - 1 ) ) - 1 , 1≤k≤J, 1≤i≤N (formula 3)
Y k = Σ i = 1 N u k m ( X i ) X i Σ i = 1 N u k m ( X i ) 1≤k≤J (formula 4)
Wherein m ∈ [1, ∞) be blur level, d (X i, Y k) the expression distance;
(4-7) selectivity constant ε>0 is provided with iterations k=0, as initial codebook, adopts fuzzy C average (FCM) clustering algorithm recursion to go out code book Y with the class center of (4-6) j(i=1,2 ... J);
(4-8) every kind of emotion is trained a code book by (4-1)~(4-7);
(5) emotion recognition
(5-1) obtain eigenvector X according to step (1) (2) (3) for statement to be identified i, X iBe quantized into the vector U (X that forms by membership function i)={ u 1, (X i), u 2(X i) ..., u J(X i), obtain X iReconstructed vector
Figure A2008101228060003C4
With quantization error D;
X ^ i = Σ k = 1 J u k m Y k / Σ k = 1 J u k m (formula 5)
D = Σ k = 1 J u k m ( X i ) d ( X i , Y k ) (formula 6)
(5-2) selecting the emotion of that code book correspondence of average quantization distortion minimum is recognition result.
CN200810122806A 2008-07-01 2008-07-01 Speech-emotion recognition method based on improved fuzzy vector quantization Pending CN101620853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810122806A CN101620853A (en) 2008-07-01 2008-07-01 Speech-emotion recognition method based on improved fuzzy vector quantization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810122806A CN101620853A (en) 2008-07-01 2008-07-01 Speech-emotion recognition method based on improved fuzzy vector quantization

Publications (1)

Publication Number Publication Date
CN101620853A true CN101620853A (en) 2010-01-06

Family

ID=41514057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810122806A Pending CN101620853A (en) 2008-07-01 2008-07-01 Speech-emotion recognition method based on improved fuzzy vector quantization

Country Status (1)

Country Link
CN (1) CN101620853A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411932A (en) * 2011-09-30 2012-04-11 北京航空航天大学 Methods for extracting and modeling Chinese speech emotion in combination with glottis excitation and sound channel modulation information
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103337244A (en) * 2013-05-20 2013-10-02 北京航空航天大学 Outlier modification algorithm in isolate syllable fundamental frequency curve
CN103685520A (en) * 2013-12-13 2014-03-26 深圳Tcl新技术有限公司 Method and device for pushing songs on basis of voice recognition
CN103778398A (en) * 2012-10-22 2014-05-07 无锡爱丁阁信息科技有限公司 Image fuzziness estimation method
CN103886869A (en) * 2014-04-09 2014-06-25 北京京东尚科信息技术有限公司 Information feedback method and system based on speech emotion recognition
CN103903016A (en) * 2014-01-13 2014-07-02 南京大学 Broad-sense related study vector quantization method using sample characteristic raw value directly
CN104064181A (en) * 2014-06-20 2014-09-24 哈尔滨工业大学深圳研究生院 Quick convergence method for feature vector quantization of speech recognition
CN104077598A (en) * 2014-06-27 2014-10-01 电子科技大学 Emotion recognition method based on speech fuzzy clustering
CN106205636A (en) * 2016-07-07 2016-12-07 东南大学 A kind of speech emotion recognition Feature fusion based on MRMR criterion
CN106203624A (en) * 2016-06-23 2016-12-07 上海交通大学 Vector Quantization based on deep neural network and method
CN106294568A (en) * 2016-07-27 2017-01-04 北京明朝万达科技股份有限公司 A kind of Chinese Text Categorization rule generating method based on BP network and system
CN106898357A (en) * 2017-02-16 2017-06-27 华南理工大学 A kind of vector quantization method based on normal distribution law
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN108268950A (en) * 2018-01-16 2018-07-10 上海交通大学 Iterative neural network quantization method and system based on vector quantization
CN111462757A (en) * 2020-01-15 2020-07-28 北京远鉴信息技术有限公司 Data processing method and device based on voice signal, terminal and storage medium
CN112435512A (en) * 2020-11-12 2021-03-02 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411932A (en) * 2011-09-30 2012-04-11 北京航空航天大学 Methods for extracting and modeling Chinese speech emotion in combination with glottis excitation and sound channel modulation information
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN102623009B (en) * 2012-03-02 2013-11-20 安徽科大讯飞信息科技股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN103778398B (en) * 2012-10-22 2016-12-07 无锡爱丁阁信息科技有限公司 Image blur method of estimation
CN103778398A (en) * 2012-10-22 2014-05-07 无锡爱丁阁信息科技有限公司 Image fuzziness estimation method
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103258532B (en) * 2012-11-28 2015-10-28 河海大学常州校区 A kind of Chinese speech sensibility recognition methods based on fuzzy support vector machine
CN103337244B (en) * 2013-05-20 2015-08-26 北京航空航天大学 Outlier amending method in a kind of isolate syllable fundamental frequency curve
CN103337244A (en) * 2013-05-20 2013-10-02 北京航空航天大学 Outlier modification algorithm in isolate syllable fundamental frequency curve
CN103685520A (en) * 2013-12-13 2014-03-26 深圳Tcl新技术有限公司 Method and device for pushing songs on basis of voice recognition
CN103903016A (en) * 2014-01-13 2014-07-02 南京大学 Broad-sense related study vector quantization method using sample characteristic raw value directly
CN103903016B (en) * 2014-01-13 2017-11-21 南京大学 Directly use the generalized correlation learning vector quantizations method of sample characteristics raw value
CN103886869B (en) * 2014-04-09 2016-09-21 北京京东尚科信息技术有限公司 A kind of information feedback method based on speech emotion recognition and system
CN103886869A (en) * 2014-04-09 2014-06-25 北京京东尚科信息技术有限公司 Information feedback method and system based on speech emotion recognition
CN104064181A (en) * 2014-06-20 2014-09-24 哈尔滨工业大学深圳研究生院 Quick convergence method for feature vector quantization of speech recognition
CN104064181B (en) * 2014-06-20 2017-04-19 哈尔滨工业大学深圳研究生院 Quick convergence method for feature vector quantization of speech recognition
CN104077598A (en) * 2014-06-27 2014-10-01 电子科技大学 Emotion recognition method based on speech fuzzy clustering
CN104077598B (en) * 2014-06-27 2017-05-31 电子科技大学 A kind of emotion identification method based on voice fuzzy cluster
CN106203624A (en) * 2016-06-23 2016-12-07 上海交通大学 Vector Quantization based on deep neural network and method
CN106203624B (en) * 2016-06-23 2019-06-21 上海交通大学 Vector Quantization and method based on deep neural network
CN106205636A (en) * 2016-07-07 2016-12-07 东南大学 A kind of speech emotion recognition Feature fusion based on MRMR criterion
CN106294568A (en) * 2016-07-27 2017-01-04 北京明朝万达科技股份有限公司 A kind of Chinese Text Categorization rule generating method based on BP network and system
CN106898357A (en) * 2017-02-16 2017-06-27 华南理工大学 A kind of vector quantization method based on normal distribution law
CN106898357B (en) * 2017-02-16 2019-10-18 华南理工大学 A kind of vector quantization method based on normal distribution law
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN107039036B (en) * 2017-02-17 2020-06-16 南京邮电大学 High-quality speaker recognition method based on automatic coding depth confidence network
CN108268950A (en) * 2018-01-16 2018-07-10 上海交通大学 Iterative neural network quantization method and system based on vector quantization
CN108268950B (en) * 2018-01-16 2020-11-10 上海交通大学 Iterative neural network quantization method and system based on vector quantization
CN111462757A (en) * 2020-01-15 2020-07-28 北京远鉴信息技术有限公司 Data processing method and device based on voice signal, terminal and storage medium
CN111462757B (en) * 2020-01-15 2024-02-23 北京远鉴信息技术有限公司 Voice signal-based data processing method, device, terminal and storage medium
CN112435512A (en) * 2020-11-12 2021-03-02 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training

Similar Documents

Publication Publication Date Title
CN101620853A (en) Speech-emotion recognition method based on improved fuzzy vector quantization
CN110400579B (en) Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network
CN108281146B (en) Short voice speaker identification method and device
CN1975856B (en) Speech emotion identifying method based on supporting vector machine
Kishore et al. Emotion recognition in speech using MFCC and wavelet features
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN104978507B (en) A kind of Intelligent controller for logging evaluation expert system identity identifying method based on Application on Voiceprint Recognition
CN102982803A (en) Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN110827857B (en) Speech emotion recognition method based on spectral features and ELM
CN110111797A (en) Method for distinguishing speek person based on Gauss super vector and deep neural network
CN102779510A (en) Speech emotion recognition method based on feature space self-adaptive projection
CN101887722A (en) Rapid voiceprint authentication method
CN109961794A (en) A kind of layering method for distinguishing speek person of model-based clustering
CN101419800B (en) Emotional speaker recognition method based on frequency spectrum translation
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Sinha et al. Acoustic-phonetic feature based dialect identification in Hindi Speech
Riazati Seresht et al. Spectro-temporal power spectrum features for noise robust ASR
CN101620852A (en) Speech-emotion recognition method based on improved quadratic discriminant
Gomes et al. i-vector algorithm with Gaussian Mixture Model for efficient speech emotion recognition
Vieira et al. Combining entropy measures and cepstral analysis for pathological voices assessment
CN115064175A (en) Speaker recognition method
Lee et al. Speech emotion recognition using spectral entropy
CN108242239A (en) A kind of method for recognizing sound-groove
CN117079673B (en) Intelligent emotion recognition method based on multi-mode artificial intelligence
Suresh et al. Language identification system using MFCC and SDC feature

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100106