CN101620853A - Speech-emotion recognition method based on improved fuzzy vector quantization - Google Patents
Speech-emotion recognition method based on improved fuzzy vector quantization Download PDFInfo
- Publication number
- CN101620853A CN101620853A CN200810122806A CN200810122806A CN101620853A CN 101620853 A CN101620853 A CN 101620853A CN 200810122806 A CN200810122806 A CN 200810122806A CN 200810122806 A CN200810122806 A CN 200810122806A CN 101620853 A CN101620853 A CN 101620853A
- Authority
- CN
- China
- Prior art keywords
- parameter
- average
- sigma
- formula
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a speech-emotion recognition method based on improved fuzzy vector quantization. The method extends the sum of fuzzy membership function from one to N so as to reduce the influence of sample wild-point on an iteration-training process to a certain extent, and adopts a clustering method based on similarity threshold and a minimum distance principle in the iteration-training process so as to avoid the problem that a clustering center is sensitive to initial values and easy to fall into local minimum values to a certain extent. Experimental results prove that the method can effectively improve the emotion recognition rate of the prior fuzzy vector quantization method.
Description
Technical field
The present invention relates to a kind of audio recognition method, particularly a kind of speech emotional recognition system and method.
Background technology
The speech emotional automatic identification technology mainly comprises two problems: the one, and which kind of feature in the employing voice signal is as emotion recognition, and the problem of affective characteristics extraction just comprises feature extraction and selection; The one, how with specific voice data classification, the problem of pattern-recognition just comprises various algorithm for pattern recognitions, as arest neighbors, neural network, support vector machine etc.
The affective characteristics of using in the speech emotional identification mainly is prosodic parameter and tonequality parameter, and the former comprises duration, word speed, energy, fundamental frequency and the parameter of deriving thereof, and the latter mainly is resonance peak, harmonic noise ratio and the parameter etc. of deriving thereof.According to three-dimensional emotional space theory, prosodic parameter mainly is to characterize various emotions in the parameter that activates dimension coordinate, and the tonequality parameter then mainly is to characterize emotion at the coordinate of dimension of tiring.For activating dimension coordinate distance emotion far away, prosodic parameter can characterize out otherness preferably; For the dimension coordinate distance emotion far away of tiring activating the dimension coordinate close together, then need tonequality class parameter to strengthen the characterization parameter otherness.There is the problem that accurately detects mostly in present parameter extracting method, and these parameters mainly embody is the glottis of human body and the feature of sound channel, with people's physiological make-up confidential relation is arranged, show stronger diversity on different individualities, this species diversity is especially obvious on different sexes.Before the present invention, in existing various recognition methodss, though the neural network method has the non-linear and extremely strong classification capacity of height, along with the required learning time increase of the increase of network is very fast, the local minimum problem also is a weak point in addition; Hidden Markov method (HMM) is longer on foundation and training time, and being applied to reality also needs to solve the too high problem of computation complexity.Though quadratic discriminant algorithm simple computation amount is little, must be prerequisite with the eigenvector Normal Distribution, influenced discrimination greatly.Then because problem such as quantization error, initial value sensitivity and less use,, but still easily be absorbed in the problem of the responsive and local minimum of initial value based on the recognition methods of vector quantization though fuzzy vector quantization has been alleviated the quantization error problem to a certain extent.
Summary of the invention
Purpose of the present invention just is to overcome the defective of above-mentioned prior art, design, a kind of speech-emotion recognition method based on the improvement fuzzy vector quantization of research.
Technical scheme of the present invention is:
A kind of based on the speech-emotion recognition method that improves fuzzy vector quantization, the steps include:
Set up training, the emotion recognition module of feature extraction analysis module, feature dimensionality reduction module, improvement fuzzy vector quantization module.The feature extraction analysis module comprises that two class Parameter Extraction and sex are regular: prosodic parameter and tonequality parameter.At first, carry out feature extraction then respectively to primary speech signal pre-emphasis, branch frame.
(1) prosodic parameter extracts
(1-1) with primary speech signal through the Hi-pass filter pre-service, extract pronunciation duration, word speed parameter;
(1-2) divide frame, windowing;
(1-3) use the short-time analysis technology, extract each frame statement principal character parameter respectively: fundamental frequency track, short-time energy track, voiced segments voiceless sound section time ratio;
(1-4) parameter of deriving of extraction part prosodic features parameter: short-time energy maximal value, minimum value, average and variance, short-time energy shake maximal value, minimum value, average and variance, fundamental frequency maximal value, minimum value, average and variance, maximal value, minimum value, average and the variance of fundamental frequency shake.What wherein short-time energy was shaken is calculated as follows:
E wherein
t 0Be the short-time energy of i frame, N is a frame number.The calculating of fundamental frequency shake is with (formula 1).
(1-5) sex is regular, according to the different sexes under the sample, is included into different set s
iCalculate average μ separately once more respectively
iAnd variances sigma
i, represent different set numbers with i here, utilize following formula that parameter is regular to identical space;
(2) tonequality characteristic parameter extraction
(2-1) maximal value, minimum value, average and the variance of extraction glottis wave parameter, comprise: glottis opening time and whole glottis period ratio (OQ, open quotient), glottis opening process time and closing course time ratio (SQ, speed quotient), glottis closure time and whole glottis period ratio (CQ, ClosedQuotient), glottis closing course time and whole glottis period ratio (ClQ, Closing Quotient), glottis ripple skewness;
(2-2) extract harmonic noise than maximal value, minimum value, average, variance;
(2-3) extract first three resonance peak maximal value, minimum value, average, variance and bandwidth;
(2-4) extract maximal value, minimum value, average, the variance that first three resonance peak is shaken; The resonance peak Jitter Calculation is with (formula 1);
(2-5) sex is regular, with (1-5);
(3) feature dimensionality reduction
(3-1) with behind whole feature extractions and regular the finishing in (1) (2), the composition characteristic vector;
(3-2) adopt principal component analysis neural network (PCANN) to realize dimensionality reduction, obtain sample characteristics vector sequence X={X
1, X
2..., X
N, };
(4) improve fuzzy vector quantization
(4-1) to all training samples of certain emotion, calculate the Euclidean distance between any two samples, two nearest samples are decided to be a class, selected distance threshold values L will be judged to this type of with distance all samples within L of one of this two sample;
(4-2) sample and the distance relevant with these samples that have the classification ownership are suitably handled, do not re-used;
(4-3) in remaining sample, find nearest pair of sample, if the distance between them then is decided to be a class respectively with these two samples, and has only a sample in all kinds of greater than L; If the distance between them is less than L, selected distance threshold values α L (0<α≤1) then will declare with distance all samples within α L of one of this sample and belong to this type of;
(4-4) repeating step (4-2), (4-3) are classified up to all samples, if last only surplus sample then is decided to be a class separately with this sample;
(4-5) adjust L and α L, gathered into the J class up to all samples;
(4-6) with membership function u
k(X
i) normalizing condition expand as
Calculate u by (formula 3)
k(X
i), calculate all kinds of class center Y by (formula 4)
j(i=1,2 ... J);
Wherein m ∈ [1, ∞) be blur level, d (X
i, Y
k) the expression distance;
(4-7) selectivity constant ε>0 is provided with iterations k=0, as initial codebook, adopts fuzzy C average (FCM) clustering algorithm recursion to go out code book Y with the class center of (4-6)
j(i=1,2 ... J);
(4-8) every kind of emotion is trained a code book by (4-1)~(4-7);
(5) emotion recognition
(5-1) obtain eigenvector X according to step (1) (2) (3) for statement to be identified
i, X
iBe quantized into the vector U (X that forms by membership function
i)={ u
1, (X
i), u
2(X
i) ..., u
J(X
i), obtain X
iReconstructed vector
With quantization error D;
(5-2) selecting the emotion of that code book correspondence of average quantization distortion minimum is recognition result.
Advantage of the present invention and effect are:
1. by characteristic parameter extraction and analysis, parameter is extended to the tonequality parameter from prosodic parameter, increased the validity of characteristic parameter identification to the emotion statement;
2. adopt the isolated component neural network that the eigenvector that is extracted is carried out dimensionality reduction, not only reduced calculated amount, and played the noise reduction effect to a certain extent;
3. the fuzzy membership function normalizing condition is relaxed, reduce of the influence of wild point code book;
4. adopt clustering method training code book, avoided initial value and local minimum problem based on similarity threshold values and minimal distance principle;
By vector quantization input vector X
iBe quantized into the vector of forming by membership function, rather than certain code word Y
k, be equivalent to increase the size of code book, reduced quantization error.
Other advantages of the present invention and effect will continue to describe below.
Description of drawings
Fig. 1---speech emotional recognition system block diagram.
Fig. 2---affective characteristics extraction and analysis module process flow diagram.
Fig. 3---glottis involves its differentiated waveform figure.
Fig. 4---principal component analysis neural network synoptic diagram.
Fig. 5---the emotion recognition result of fuzzy vector quantization method compares before and after improving.
Embodiment
Below in conjunction with drawings and Examples, technical solutions according to the invention are further elaborated.
As shown in Figure 1, be the native system block diagram, mainly be divided into 4 bulks: feature extraction analysis module, feature dimensionality reduction module, fuzzy vector quantization code book training module and emotion recognition module.The total system implementation is divided into training process and identifying.Training process comprises feature extraction analysis, feature dimensionality reduction and the training of fuzzy vector quantization code book; Identifying comprises feature extraction analysis, feature dimensionality reduction and emotion recognition.
One. affective characteristics extraction and analysis module
1. the prosodic features parameter is selected
The prosodic features parameter comprises: short-time energy maximal value, minimum value, average and variance; Short-time energy shake maximal value, minimum value, average and variance; The maximal value of fundamental frequency, minimum value, average and variance; Maximal value, minimum value, average and the variance of fundamental frequency shake; Voiced segments voiceless sound section time ratio; Word speed.
At first, the characteristic parameter extraction flow process in 2 is carried out pre-emphasis with feature statement to be extracted and is handled with reference to the accompanying drawings, comprises that high-pass filtering, statement begin the detection of end points and end caps; Extract statement pronunciation duration, these two features of word speed of full sentence; Divide the frame windowing to statement then, adopt the short-time analysis technology, according to the gender, obtain each frame fundamental frequency, short-time energy, voiced sound frame number and voiceless sound frame number respectively, each frame gained parameter is gathered, obtain pitch contour, fundamental tone shake track, short-time energy track and the short-time energy shake track of statement respectively, and then obtain their characteristic statistic, and it is regular to carry out sex, obtains above-mentioned whole prosodic features parameter.
2. the tonequality characteristic parameter is selected
The tonequality characteristic parameter comprises: the maximal value of OQ, minimum value, average and variance; The maximal value of SQ, minimum value, average and variance; The maximal value of CQ, minimum value, average and variance; The maximal value of ClQ, minimum value, average and variance; R
kMaximal value, minimum value, average and variance; The first resonance peak maximal value, minimum value, average, variance and bandwidth; Maximal value, minimum value, average, the variance of the shake of first resonance peak; The second resonance peak maximal value, minimum value, average, variance and bandwidth; Maximal value, minimum value, average, the variance of the shake of second resonance peak; The 3rd resonance peak maximal value, minimum value, average, variance and bandwidth; Maximal value, minimum value, average, the variance of the shake of the 3rd resonance peak; Harmonic noise is than maximal value, minimum value, average, variance.
Choosing of a plurality of tonequality parameters is that the present invention proposes one of characteristics of method.Though prosodic features plays a leading role in identification, some activates when tieing up the emotion of separating near the dimension of tiring in identification, and as glad and angry, the tonequality feature can play effective supplementary function.The variation of glottal waveform shape when the tonequality parameter is the reflection pronunciation, its influence factor has muscle tone, sound channel central authorities' pressure and sound channel length tension force, concrete sound Source Type (articulation type), glottis wave parameter and sound channel formant parameter etc.LF model (Liljencrants-Fant Mode) is the model of the description glottis ripple used always, as shown in Figure 3, and T wherein
0: pitch period; t
o: glottis is opened constantly; t
c: glottis closing moment; t
p: the glottis ripple reaches peak-peak constantly; t
e: the difference ripple reaches maximum negative peak constantly.Can extract following glottis wave parameter according to this model:
During concrete enforcement, need that still the emotion statement is carried out pre-emphasis and handle, comprise that high-pass filtering, statement begin the detection of end points and end caps; Divide the frame windowing to statement then, obtain tonequality parameters such as glottis wave characteristic, resonance peak feature, harmonic noise compare respectively, and it is regular to carry out sex, finally be used for the tonequality characteristic parameter of code book training or identification.
In the implementation of system, the feature extraction analysis is absolutely necessary.In training process, the feature extraction analysis of training sample can directly be carried out according to flow process shown in Figure 2.In identifying, the feature extraction of statement to be identified is analyzed and is carried out according to Fig. 2 flow process equally.
Two. the feature dimensionality reduction
Preceding surface analysis has extracted totally 69 characteristic parameters, for avoiding the too high computation complexity that causes of dimension to promote, and redundant information adopts the isolated component neural network to realize dimensionality reduction to the influence of identification, employing is based on the linear unsupervised learning neural network of Hebb rule, as shown in Figure 4.By study to weight matrix W, make weight vector approach the pairing proper vector of eigenwert in the oblique variance battle array of proper vector x, avoid directly inversion operation to matrix.Obtain eigenvector y=W behind the dimensionality reduction
TX.It is as follows that weight vector is revised rule:
w
j[k+1]=w
j[k]+η (y
j[k] x ' [k]-y
j 2[k] w
j[k]) (formula 12)
Three. improve the training of fuzzy vector quantization code book
The traditional fuzzy vector quantization is to adopt the fuzzy clustering algorithm to replace the design of K mean algorithm to quantize a kind of method of code book, can reduce the quantization error of code book to a certain extent, wild point disturbs, initial value is responsive and local minimization problem but still exist, for this reason, the present invention proposes a kind of improvement fuzzy vector quantization method, and concrete steps are as follows:
1. to all training characteristics samples of a certain emotion, calculate the Euclidean distance between any two samples, two nearest samples are decided to be a class, selected distance threshold values L will be judged to this type of with distance all samples within L of one of this two sample;
2. the sample and the distance relevant with these samples that have the classification ownership are suitably handled, do not re-used;
3. in remaining sample, find nearest pair of sample, if the distance between them then is decided to be a class respectively with these two samples, and has only a sample in all kinds of greater than L; If the distance between them is less than L, selected distance threshold values α L (0<α≤1) then will declare with distance all samples within α L of one of this sample and belong to this type of;
4. repeating step 2,3, all are classified up to all samples, if last only surplus sample then is decided to be a class separately with this sample;
5. adjust L and α L, gathered into the J class up to all samples;
6. calculate membership function u according to (formula 3)
k(X
i), with u
k(X
i) normalizing condition expand as
This also is one of characteristics of the present invention, and calculates all kinds of class center Y by (formula 4)
j(i=1,2 ... J);
7. selectivity constant ε>0 is provided with iterations k=0, as initial codebook, adopts fuzzy C mean algorithm recursion to go out code book Y with result in 6
j(i=1,2 ... J);
8. every kind of emotion is trained a code book respectively by 1~7.
Four. the emotion recognition module
Emotion statement for to be identified extracts its eigenvector according to Fig. 2 flow process, utilizes the principal component analysis neural network to carry out dimensionality reduction then, obtains X
iWith X
iThe code book of corresponding every kind of emotion carries out vector quantization, X
iBe quantized into the vector U (X that forms by membership function
i)={ u
1, (X
i), u
2(X
i) ..., u
J(X
i), obtain X
iReconstructed vector
With quantization error D; Selecting the emotion of that code book correspondence of average quantization distortion minimum is recognition result.
Five. the evaluation of recognition system
Because it is N that the degree of membership summation is expanded by 1, reduced the influence of sample wild-point to a certain extent to the training iterative process, in code book instruction amount process, adopt clustering method based on similarity threshold values and minimal distance principle, avoided to a certain extent cluster centre to the initial value sensitivity, easily be absorbed in the problem of local minimum, result from two kinds of emotion identification methods of Fig. 5, its recognition effect obtains bigger improvement, angry discrimination has improved 12.3%, sad discrimination has improved 5.1%, glad discrimination has improved 5.9%, surprised discrimination has improved 14.9%, and the inventive method is discerned speech emotional and is much higher than existing additive method.
The scope that the present invention asks for protection is not limited only to the description of this embodiment.
Claims (1)
1. the speech-emotion recognition method based on the improvement fuzzy vector quantization the steps include:
Set up training, the emotion recognition module of feature extraction analysis module, feature dimensionality reduction module, improvement fuzzy vector quantization module; The feature extraction analysis module comprises that two class Parameter Extraction and sex are regular: prosodic parameter and tonequality parameter; At first, carry out feature extraction then respectively to primary speech signal pre-emphasis, branch frame;
(1) prosodic parameter extracts
(1-1) with primary speech signal through the Hi-pass filter pre-service, extract pronunciation duration, word speed parameter;
(1-2) divide frame, windowing;
(1-3) use the short-time analysis technology, extract each frame statement principal character parameter respectively: fundamental frequency track, short-time energy track, voiced segments voiceless sound section time ratio;
(1-4) parameter of deriving of extraction part prosodic features parameter: short-time energy maximal value, minimum value, average and variance, short-time energy shake maximal value, minimum value, average and variance, fundamental frequency maximal value, minimum value, average and variance, maximal value, minimum value, average and the variance of fundamental frequency shake; What wherein short-time energy was shaken is calculated as follows:
E wherein
i 0Be the short-time energy of i frame, N is a frame number; The calculating of fundamental frequency shake is with (formula 1);
(1-5) sex is regular, according to the different sexes under the sample, is included into different set s
iCalculate average μ separately once more respectively
iAnd variances sigma
i, represent different set numbers with i here, utilize following formula that parameter is regular to identical space;
(2) tonequality characteristic parameter extraction
(2-1) maximal value, minimum value, average and the variance of extraction glottis wave parameter, comprise: glottis opening time and whole glottis period ratio (OQ, open quotient), glottis opening process time and closing course time ratio (SQ, speed quotient), glottis closure time and whole glottis period ratio (CQ, ClosedQuotient), glottis closing course time and whole glottis period ratio (ClQ, Closing Quotient), glottis ripple skewness;
(2-2) extract harmonic noise than maximal value, minimum value, average, variance;
(2-3) extract first three resonance peak maximal value, minimum value, average, variance and bandwidth;
(2-4) extract maximal value, minimum value, average, the variance that first three resonance peak is shaken; The resonance peak Jitter Calculation is with (formula 1);
(2-5) sex is regular, with (1-5);
(3) feature dimensionality reduction
(3-1) with behind whole feature extractions and regular the finishing in (1) (2), the composition characteristic vector;
(3-2) adopt principal component analysis neural network (PCANN) to realize dimensionality reduction, obtain sample characteristics vector sequence X={X
1, X
2..., X
N, };
(4) improve fuzzy vector quantization
(4-1) to all training samples of certain emotion, calculate the Euclidean distance between any two samples, two nearest samples are decided to be a class, selected distance threshold values L will be judged to this type of with distance all samples within L of one of this two sample;
(4-2) sample and the distance relevant with these samples that have the classification ownership are suitably handled, do not re-used;
(4-3) in remaining sample, find nearest pair of sample, if the distance between them then is decided to be a class respectively with these two samples, and has only a sample in all kinds of greater than L; If the distance between them is less than L, selected distance threshold values α L (0<α≤1) then will declare with distance all samples within α L of one of this sample and belong to this type of;
(4-4) repeating step (4-2), (4-3) are classified up to all samples, if last only surplus sample then is decided to be a class separately with this sample;
(4-5) adjust L and α L, gathered into the J class up to all samples;
(4-6) with membership function u
k(X
i) normalizing condition expand as
Calculate u by (formula 3)
k(X
i),
Calculate all kinds of class center Y by (formula 4)
j(i=1,2 ... J);
Wherein m ∈ [1, ∞) be blur level, d (X
i, Y
k) the expression distance;
(4-7) selectivity constant ε>0 is provided with iterations k=0, as initial codebook, adopts fuzzy C average (FCM) clustering algorithm recursion to go out code book Y with the class center of (4-6)
j(i=1,2 ... J);
(4-8) every kind of emotion is trained a code book by (4-1)~(4-7);
(5) emotion recognition
(5-1) obtain eigenvector X according to step (1) (2) (3) for statement to be identified
i, X
iBe quantized into the vector U (X that forms by membership function
i)={ u
1, (X
i), u
2(X
i) ..., u
J(X
i), obtain X
iReconstructed vector
With quantization error D;
(5-2) selecting the emotion of that code book correspondence of average quantization distortion minimum is recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810122806A CN101620853A (en) | 2008-07-01 | 2008-07-01 | Speech-emotion recognition method based on improved fuzzy vector quantization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810122806A CN101620853A (en) | 2008-07-01 | 2008-07-01 | Speech-emotion recognition method based on improved fuzzy vector quantization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101620853A true CN101620853A (en) | 2010-01-06 |
Family
ID=41514057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810122806A Pending CN101620853A (en) | 2008-07-01 | 2008-07-01 | Speech-emotion recognition method based on improved fuzzy vector quantization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101620853A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411932A (en) * | 2011-09-30 | 2012-04-11 | 北京航空航天大学 | Methods for extracting and modeling Chinese speech emotion in combination with glottis excitation and sound channel modulation information |
CN102623009A (en) * | 2012-03-02 | 2012-08-01 | 安徽科大讯飞信息技术股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103337244A (en) * | 2013-05-20 | 2013-10-02 | 北京航空航天大学 | Outlier modification algorithm in isolate syllable fundamental frequency curve |
CN103685520A (en) * | 2013-12-13 | 2014-03-26 | 深圳Tcl新技术有限公司 | Method and device for pushing songs on basis of voice recognition |
CN103778398A (en) * | 2012-10-22 | 2014-05-07 | 无锡爱丁阁信息科技有限公司 | Image fuzziness estimation method |
CN103886869A (en) * | 2014-04-09 | 2014-06-25 | 北京京东尚科信息技术有限公司 | Information feedback method and system based on speech emotion recognition |
CN103903016A (en) * | 2014-01-13 | 2014-07-02 | 南京大学 | Broad-sense related study vector quantization method using sample characteristic raw value directly |
CN104064181A (en) * | 2014-06-20 | 2014-09-24 | 哈尔滨工业大学深圳研究生院 | Quick convergence method for feature vector quantization of speech recognition |
CN104077598A (en) * | 2014-06-27 | 2014-10-01 | 电子科技大学 | Emotion recognition method based on speech fuzzy clustering |
CN106205636A (en) * | 2016-07-07 | 2016-12-07 | 东南大学 | A kind of speech emotion recognition Feature fusion based on MRMR criterion |
CN106203624A (en) * | 2016-06-23 | 2016-12-07 | 上海交通大学 | Vector Quantization based on deep neural network and method |
CN106294568A (en) * | 2016-07-27 | 2017-01-04 | 北京明朝万达科技股份有限公司 | A kind of Chinese Text Categorization rule generating method based on BP network and system |
CN106898357A (en) * | 2017-02-16 | 2017-06-27 | 华南理工大学 | A kind of vector quantization method based on normal distribution law |
CN107039036A (en) * | 2017-02-17 | 2017-08-11 | 南京邮电大学 | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network |
CN108268950A (en) * | 2018-01-16 | 2018-07-10 | 上海交通大学 | Iterative neural network quantization method and system based on vector quantization |
CN111462757A (en) * | 2020-01-15 | 2020-07-28 | 北京远鉴信息技术有限公司 | Data processing method and device based on voice signal, terminal and storage medium |
CN112435512A (en) * | 2020-11-12 | 2021-03-02 | 郑州大学 | Voice behavior assessment and evaluation method for rail transit simulation training |
-
2008
- 2008-07-01 CN CN200810122806A patent/CN101620853A/en active Pending
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411932A (en) * | 2011-09-30 | 2012-04-11 | 北京航空航天大学 | Methods for extracting and modeling Chinese speech emotion in combination with glottis excitation and sound channel modulation information |
CN102623009A (en) * | 2012-03-02 | 2012-08-01 | 安徽科大讯飞信息技术股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN102623009B (en) * | 2012-03-02 | 2013-11-20 | 安徽科大讯飞信息科技股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN103778398B (en) * | 2012-10-22 | 2016-12-07 | 无锡爱丁阁信息科技有限公司 | Image blur method of estimation |
CN103778398A (en) * | 2012-10-22 | 2014-05-07 | 无锡爱丁阁信息科技有限公司 | Image fuzziness estimation method |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103258532B (en) * | 2012-11-28 | 2015-10-28 | 河海大学常州校区 | A kind of Chinese speech sensibility recognition methods based on fuzzy support vector machine |
CN103337244B (en) * | 2013-05-20 | 2015-08-26 | 北京航空航天大学 | Outlier amending method in a kind of isolate syllable fundamental frequency curve |
CN103337244A (en) * | 2013-05-20 | 2013-10-02 | 北京航空航天大学 | Outlier modification algorithm in isolate syllable fundamental frequency curve |
CN103685520A (en) * | 2013-12-13 | 2014-03-26 | 深圳Tcl新技术有限公司 | Method and device for pushing songs on basis of voice recognition |
CN103903016A (en) * | 2014-01-13 | 2014-07-02 | 南京大学 | Broad-sense related study vector quantization method using sample characteristic raw value directly |
CN103903016B (en) * | 2014-01-13 | 2017-11-21 | 南京大学 | Directly use the generalized correlation learning vector quantizations method of sample characteristics raw value |
CN103886869B (en) * | 2014-04-09 | 2016-09-21 | 北京京东尚科信息技术有限公司 | A kind of information feedback method based on speech emotion recognition and system |
CN103886869A (en) * | 2014-04-09 | 2014-06-25 | 北京京东尚科信息技术有限公司 | Information feedback method and system based on speech emotion recognition |
CN104064181A (en) * | 2014-06-20 | 2014-09-24 | 哈尔滨工业大学深圳研究生院 | Quick convergence method for feature vector quantization of speech recognition |
CN104064181B (en) * | 2014-06-20 | 2017-04-19 | 哈尔滨工业大学深圳研究生院 | Quick convergence method for feature vector quantization of speech recognition |
CN104077598A (en) * | 2014-06-27 | 2014-10-01 | 电子科技大学 | Emotion recognition method based on speech fuzzy clustering |
CN104077598B (en) * | 2014-06-27 | 2017-05-31 | 电子科技大学 | A kind of emotion identification method based on voice fuzzy cluster |
CN106203624A (en) * | 2016-06-23 | 2016-12-07 | 上海交通大学 | Vector Quantization based on deep neural network and method |
CN106203624B (en) * | 2016-06-23 | 2019-06-21 | 上海交通大学 | Vector Quantization and method based on deep neural network |
CN106205636A (en) * | 2016-07-07 | 2016-12-07 | 东南大学 | A kind of speech emotion recognition Feature fusion based on MRMR criterion |
CN106294568A (en) * | 2016-07-27 | 2017-01-04 | 北京明朝万达科技股份有限公司 | A kind of Chinese Text Categorization rule generating method based on BP network and system |
CN106898357A (en) * | 2017-02-16 | 2017-06-27 | 华南理工大学 | A kind of vector quantization method based on normal distribution law |
CN106898357B (en) * | 2017-02-16 | 2019-10-18 | 华南理工大学 | A kind of vector quantization method based on normal distribution law |
CN107039036A (en) * | 2017-02-17 | 2017-08-11 | 南京邮电大学 | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network |
CN107039036B (en) * | 2017-02-17 | 2020-06-16 | 南京邮电大学 | High-quality speaker recognition method based on automatic coding depth confidence network |
CN108268950A (en) * | 2018-01-16 | 2018-07-10 | 上海交通大学 | Iterative neural network quantization method and system based on vector quantization |
CN108268950B (en) * | 2018-01-16 | 2020-11-10 | 上海交通大学 | Iterative neural network quantization method and system based on vector quantization |
CN111462757A (en) * | 2020-01-15 | 2020-07-28 | 北京远鉴信息技术有限公司 | Data processing method and device based on voice signal, terminal and storage medium |
CN111462757B (en) * | 2020-01-15 | 2024-02-23 | 北京远鉴信息技术有限公司 | Voice signal-based data processing method, device, terminal and storage medium |
CN112435512A (en) * | 2020-11-12 | 2021-03-02 | 郑州大学 | Voice behavior assessment and evaluation method for rail transit simulation training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101620853A (en) | Speech-emotion recognition method based on improved fuzzy vector quantization | |
CN110400579B (en) | Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network | |
CN108281146B (en) | Short voice speaker identification method and device | |
CN1975856B (en) | Speech emotion identifying method based on supporting vector machine | |
Kishore et al. | Emotion recognition in speech using MFCC and wavelet features | |
CN103345923B (en) | A kind of phrase sound method for distinguishing speek person based on rarefaction representation | |
CN104978507B (en) | A kind of Intelligent controller for logging evaluation expert system identity identifying method based on Application on Voiceprint Recognition | |
CN102982803A (en) | Isolated word speech recognition method based on HRSF and improved DTW algorithm | |
CN110827857B (en) | Speech emotion recognition method based on spectral features and ELM | |
CN110111797A (en) | Method for distinguishing speek person based on Gauss super vector and deep neural network | |
CN102779510A (en) | Speech emotion recognition method based on feature space self-adaptive projection | |
CN101887722A (en) | Rapid voiceprint authentication method | |
CN109961794A (en) | A kind of layering method for distinguishing speek person of model-based clustering | |
CN101419800B (en) | Emotional speaker recognition method based on frequency spectrum translation | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
Sinha et al. | Acoustic-phonetic feature based dialect identification in Hindi Speech | |
Riazati Seresht et al. | Spectro-temporal power spectrum features for noise robust ASR | |
CN101620852A (en) | Speech-emotion recognition method based on improved quadratic discriminant | |
Gomes et al. | i-vector algorithm with Gaussian Mixture Model for efficient speech emotion recognition | |
Vieira et al. | Combining entropy measures and cepstral analysis for pathological voices assessment | |
CN115064175A (en) | Speaker recognition method | |
Lee et al. | Speech emotion recognition using spectral entropy | |
CN108242239A (en) | A kind of method for recognizing sound-groove | |
CN117079673B (en) | Intelligent emotion recognition method based on multi-mode artificial intelligence | |
Suresh et al. | Language identification system using MFCC and SDC feature |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20100106 |