CN1758332A - Speaker recognition method based on MFCC linear emotion compensation - Google Patents
Speaker recognition method based on MFCC linear emotion compensation Download PDFInfo
- Publication number
- CN1758332A CN1758332A CNA2005100613603A CN200510061360A CN1758332A CN 1758332 A CN1758332 A CN 1758332A CN A2005100613603 A CNA2005100613603 A CN A2005100613603A CN 200510061360 A CN200510061360 A CN 200510061360A CN 1758332 A CN1758332 A CN 1758332A
- Authority
- CN
- China
- Prior art keywords
- mfcc
- compensation
- sigma
- fundamental frequency
- linear
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Complex Calculations (AREA)
Abstract
This invention relates to an identification method for speakers based on MFCC linear sensitive compensation including: 1, pre-processing phonetic signals, 2, picking up the characters on the phonetic frame: extracting the MFCC on the phone of a speaker and the fundamental frequency to divide the phone signal flow into a sonant part and a surd part based on that if the fundamental frequency exists to throw away a frame phone if it is surd, 3, carrying out linear compensation to the MFCC of the related frame according to the change of the fundamental frequency, 4, compensating the MFCC based on the maximum coefficient of the probability got from the maximum likelihood evaluation to train with it, 5, identification.
Description
Technical field
The present invention relates to biometrics identification technology, mainly is a kind of method for distinguishing speek person based on MFCC linear emotion compensation.
Background technology
Biometrics identification technology is meant by computing machine and utilizes mankind itself's physiology or behavioural characteristic to carry out a kind of technology of authentication, it is a foundation with unique, reliable, stable physiological characteristic of human body (as fingerprint, iris, face, palmmprint etc.) or behavioural characteristic (speech, keystroke, gait, signature etc.), adopt the power and the network technology of computing machine to carry out Flame Image Process and pattern-recognition, in order to differentiate people's identity.It is wherein a kind of that Application on Voiceprint Recognition or Speaker Identification belong to, and is a speech parameter according to reflection speaker's physiology and behavioural characteristic in the speech waveform, discerns the technology of speaker ' s identity automatically.
Not only comprise speaker information and language content information in people's the sound, also be full of features such as emotion and mood.Discrimination can sharply descend traditional method for distinguishing speek person on the voice of emotion influence containing, and this is because the emotional factor that does not have will be included in the sound is taken into account, does not just consider the effect of the rhythm and paralanguage in the voice.Show on the feature that physiological characteristic is only extracted in traditional vocal print feature extraction from voice signal, the Application on Voiceprint Recognition system mainly relies on the lower level acoustic feature to discern.Because speaker's personal characteristics can not proper be portrayed in information extraction all sidedly, causes existing Application on Voiceprint Recognition system performance instability.
Summary of the invention
The present invention will solve the existing defective of above-mentioned technology, provide a kind of use based on method for distinguishing speek person under the emotional speech of the linear cepstrum feature compensation of fundamental frequency, by linear compensation, be implemented in the robustness that the emotional factor influence improves Speaker Identification down to speaker's cepstrum feature.
The technical solution adopted for the present invention to solve the technical problems: this method for distinguishing speek person based on MFCC linear emotion compensation, 1), voice signal carries out pre-service key step is:: comprise that mainly sampling and quantification, pre-emphasis handle and windowing; 2), the feature extraction on the speech frame: on speaker's voice, extract cepstrum feature MFCC and fundamental frequency, whether exist according to fundamental frequency, sound signal stream is divided into voiced segments and voiceless sound section, if judge that certain frame is a unvoiced frames, then abandon this frame voice, disregard; 3), the MFCC of respective frame is carried out linear compensation, constantly adjust the probable value maximum that penalty coefficient obtains the maximal possibility estimation in the EM algorithm therebetween, and determine penalty coefficient thus according to the variation of fundamental frequency; 4), the coefficient of the probability maximum that maximal possibility estimation obtains is compensated to MFCC, train by the phonetic feature after the compensation according to this; 5), identification: after being used for phonetic entry, through feature extraction, obtain a characteristic vector sequence, this sequence is input among the GMM of relevant user model parameter, obtains the similarity value and gives a mark to the user according to it.
The technical scheme that technical solution problem of the present invention is adopted can also be further perfect.Described cepstrum feature linear compensation is revised by the fundamental frequency of corresponding frame for the MFCC feature of each frame is respectively tieed up value, it can be tried one's best characterize speaker's personal characteristics better, reduce the variation of the inner phonetic feature of speaker that brings because of the emotion change.Described penalty coefficient is to carry out cepstrum feature employed description fundamental frequency of when compensation to change the factor to the MFCC feature affects, can adjust by EM algorithm repeatedly and obtain best penalty coefficient.Described repeatedly EM algorithm determines that the The optimal compensation coefficient method is to conceal probability estimate by the MFCC after the compensation of different penalty coefficients, finds out the penalty coefficient that the penalty coefficient that wherein makes the probable value maximum uses during as training pattern.
The effect that the present invention is useful is: adopt the cepstrum feature compensation based on fundamental frequency, utilize the Changing Pattern of prosodic features in emotional speech, make speaker characteristic have more stability after the MFCC feature of emotional speech compensated, to reduce speaker self the voice difference that the emotion influence brings as far as possible.Select best penalty coefficient by the EM algorithm that repeatedly calls in gauss hybrid models (GMM) training process.Use this method can find the best coefficient of describing variation relation between fundamental frequency and the original MFCC feature.
Description of drawings
Fig. 1 is the process of linear compensation EM training algorithm of the present invention;
Fig. 2 is an algorithm flow chart of the present invention;
Embodiment
The invention will be described further below in conjunction with drawings and Examples: method of the present invention was divided into for six steps.
The first step: voice signal pre-service
1, sampling and quantification
A), voice signal is carried out filtering, make its nyquist frequency F with sharp filter
NBe 4KHz;
B), speech sample rate F=2F is set
N
C), to voice signal s
a(t) sample by the cycle, obtain the amplitude sequence of audio digital signals
D), s (n) is carried out quantization encoding, the quantized value that obtains amplitude sequence is represented s ' (n) with pulse code modulation (pcm).
2, pre-emphasis is handled
A), Z transfer function H (the z)=1-az of digital filter is set
-1In pre emphasis factor a, the value that the desirable ratio of a 1 is slightly little;
B), s ' is (n) by digital filter, obtains the suitable amplitude sequence s of the high, medium and low frequency amplitude of voice signal " (n).
3, windowing
A), the frame length N of computing voice frame, N need satisfy:
Here F is the speech sample rate, and unit is Hz;
B), be that N, the frame amount of moving are N/2 with the frame length, s " (n) is divided into a series of speech frame F
m, each speech frame comprises N voice signal sample;
C), calculate the hamming code window function:
D), to each speech frame F
mAdd hamming code window:
ω(n)×F
m(n){F
m′(n)|n=1,1,...,N-1}。
Second step: feature extraction
Feature extraction on the speech frame comprises the extraction of fundamental frequency (pitch) and Mel cepstrum coefficient (MFCC).
1, fundamental frequency (pitch):
A), the hunting zone f of fundamental frequency is set
Floor=50, f
Ceiling=1250 (Hz);
B), the span f of the fundamental frequency of voice is set
Min=50, f
Max=550 (Hz);
C), be fast fourier transform FFT, time-domain signal s (n) is become frequency domain signal X (k).
D), calculate the SHR (subharmonic-harmonic wave ratio) of each frequency
SHR=SS/SH
Wherein
N=f
ceiling/f
E), find out the highest frequency f of SHR
1
F) if f
1>f
MaxPerhaps f
1SS-SH<0, think non-speech frame so, fundamental frequency is 0, Pitch=0
G), at [1.9375f
1, 2.0625f
1] the interval seek the frequency f of the local maximum of SHR
2
H) if f
2>f
Max, perhaps f
2SHR>0.2, Pitch=f
1
I), other situations, Pitch=f
2
J), the fundamental frequency that obtains is carried out the auto-correlation effect:
From the mid point of frame, the long sampled point of 1/pitch is respectively got in front and back, calculates their autocorrelation value C, if C<0.2 thinks that so the fundamental frequency value is unreliable, Pitch=0.
K), at last whole Pitch values is carried out median smoothing filtering.
2, the extraction of MFCC:
A), the exponent number p of Mel cepstrum coefficient is set;
B), be fast fourier transform FFT, time-domain signal s (n) is become frequency domain signal X (k).
C), calculate Mel territory scale:
D), calculate corresponding frequency domain scale:
E), calculate each Mel territory passage φ
jOn the logarithm energy spectrum:
Wherein
F), be discrete cosine transform DCT
The 3rd step, cepstrum feature compensation
1, alignment cepstrum feature and fundamental frequency
The voiced sound signal is a kind of quasi-periodic signal, and its cycle is called fundamental frequency.Whether exist according to fundamental frequency, sound signal stream is divided into voiced segments and voiceless sound section,, then abandon this frame voice, disregard if judge that certain frame is the voiceless sound section.
2, determine the The optimal compensation coefficient by the EM algorithm
In previous step, corresponding different penalty coefficient α
kCarry out the repeatedly probability calculation of latent state, to obtain the The optimal compensation coefficient.
A), to compensate coefficient be α to the cepstrum feature of corresponding frame
kLinear compensation
X (t) is a t cepstrum feature constantly, and Y (t) is a t fundamental frequency constantly, x
Opt(t) be the cepstrum feature of this moment after over-compensation, E (Y (t)) is the average pitch frequency:
B), estimate latent state probability
Wherein
D), estimate parameter P ', the μ of GMM with local maximal criterion
i' and R, i.e. λ '.
The 4th step, training
Each speaker's phonetic feature has all formed specific distribution in feature space, the characteristic distribution after over-compensation can be used to describe speaker's individual character better.Gauss hybrid models (GMM) is the characteristic distribution that is similar to the speaker with the linear combination of a plurality of Gaussian distribution.
The functional form of each speaker's probability density function is identical, the parameter in the different just functions.M rank gauss hybrid models GMM comes the distribution of descriptor frame feature in feature space with the linear combination of M single Gaussian distribution, that is:
Wherein, p is the dimension of feature, b
i(x) being kernel function, is that mean value vector is μ
i, covariance matrix is R
iGauss of distribution function, M (optional, as to be generally 16,32) is the exponent number of GMM model, is made as one in the past and determines integer setting up speaker model.
Be the parameter among the speaker characteristic distribution GMM.As the weighting coefficient that Gaussian Mixture distributes, P
iShould satisfy and make:
Promptly have
Because the p (x) that calculates among the GMM need ask p * p dimension square formation R
i(i=1,2 ..., M) contrary, operand is big.For this reason, with R
jBe made as diagonal matrix, inversion operation be converted into ask computing reciprocal, improve arithmetic speed.
The 5th step, identification
After being used for phonetic entry,, obtain a characteristic vector sequence through feature extraction.This sequence is input among the GMM of relevant user model parameter, obtains the similarity value and gives a mark to the user according to it.
Experimental result
Native system is tested on Emotional Prosody Speech sound bank.This sound bank is by (the Linguistic Data Consortium of interlinguistics data alliance, be LDC) the emotional speech database set up according to database standard, pronunciation character research as the different emotions voice, record by 7 professional performers (3 male target speakers and 4 women's target speakers), read aloud a series of specific statements that give in English, mainly be date and numeral, contained 14 kinds of different emotions types.The method of recording is the different tone, intonation and the word speed when allowing the corresponding emotion of actor, each speaker does not wait in the record length of every kind of emotion, between 10 seconds to 40 seconds, also have only a few to reach 50 seconds greatly, the total record length of each speaker is greatly about 5,6 minutes.
We design and have finished two groups of experiments on this storehouse.The cepstrum feature training pattern of not passing through any backoff algorithm is only used in first group of benchmark experiment that is to use classical MFCC-GMM, and GMM is by common EM algorithm training.This group is tested control group the most.
Second group of experiment carried out linear compensation to cepstrum feature, and adopts EM repeatedly to estimate to select best penalty coefficient, uses revised MFCC proper vector training GMM model.
Assess for performance, error rate (EER, Equal Error Rate) and discrimination (IR, Identification Rate) such as select for use to be used as the evaluation criteria of experimental result speaker identification system.
The calculating of EER need be used other two evaluation indexes:
(1) false acceptance rate FA: the phrase number that mistake is accepted is divided by the false acceptance rate that should unaccepted total phrase number promptly obtains the speaker verification;
(2) false rejection rate FR: with the phrase number of False Rejects divided by the false rejection rate that should received total phrase number promptly obtains the speaker verification.
When FA=FR or | during FA-FR|<δ (δ<0.0001), obtain system etc. error rate (EER), i.e. EER=FA or EER=FR.
The computing formula of discrimination IR is:
Being provided with of experiment parameter is as follows:
Window is long | 32ms |
Stepping | 16ms |
Pre-emphasis | 0.97 |
The MFCC dimension | 16MFCC+delta |
GMM | 32 rank |
Experimental result is as follows:
Method | EER(%) | IR(%) |
The benchmark experiment | 32.41 | 62.94 |
This method | 29.92 | 73.04 |
Every kind of emotion is divided other experimental result such as following table, compares with the benchmark experiment, and "+" represents that this value raises to some extent, and "-" expression reduces:
Affective state | Relative EER (%) | Relative IR (%) |
Proud (Elation) | -4.30 | +6.29 |
(Panic) in alarm | -10.76 | +19.86 |
Indignation (Hot anger) | -3.60 | +9.35 |
Detest (Disgust) | -3.70 | +15.56 |
Angry (Cold anger) | -1.92 | +12.82 |
Anxiety (Anxiety) | -3.92 | +8.82 |
(Interest) of intense interest | -1.41 | +5.09 |
Desperate (Despair) | -2.79 | +5.78 |
Contempt (Contempt) | -1.02 | +10.0 |
Sad (Sadness) | -3.53 | +15.23 |
Pride (Pride) | -2.76 | +5.96 |
Ashamed (Shame) | -1.35 | +11.49 |
Be weary of (Boredom) | -0.00 | +10.39 |
Neutral (Neutral) | -0.00 | +6.25 |
Experimental machine device configuration CPU is AMD Athlon (tm) XP2500+, in save as 512M ddr400.
Experimental result shows that the eigen compensation method can make cepstrum feature more can describe speaker's individual information, thereby improves the performance of Speaker Identification, makes its error rate reduce, and discrimination raises.And the experiment on the emotion storehouse has shown that this method all has effect preferably for various affective states.
Claims (6)
1, a kind of method for distinguishing speek person based on MFCC linear emotion compensation, it is characterized in that: key step is:
1), voice signal carries out pre-service: mainly comprise sampling and quantification, pre-emphasis processing and windowing;
2), the feature extraction on the speech frame: on speaker's voice, extract cepstrum feature MFCC and fundamental frequency, whether exist according to fundamental frequency, sound signal stream is divided into voiced segments and voiceless sound section, if judge that certain frame is a unvoiced frames, then abandon this frame voice, disregard;
3), the MFCC of respective frame is carried out linear compensation, constantly adjust the probable value maximum that penalty coefficient obtains the maximal possibility estimation in the EM algorithm therebetween, and determine penalty coefficient thus according to the variation of fundamental frequency;
4), the coefficient of the probability maximum that maximal possibility estimation obtains is compensated to MFCC, train by the phonetic feature after the compensation according to this;
5), identification: after being used for phonetic entry, through feature extraction, obtain a characteristic vector sequence, this sequence is input among the GMM of relevant user model parameter, obtains the similarity value and gives a mark to the user according to it.
2, the method for distinguishing speek person based on MFCC linear emotion compensation according to claim 1, it is characterized in that: described cepstrum feature linear compensation is revised by the fundamental frequency of corresponding frame for the MFCC feature of each frame is respectively tieed up value, it can be tried one's best characterize speaker's personal characteristics better.
3, the method for distinguishing speek person based on MFCC linear emotion compensation according to claim 1, it is characterized in that: described penalty coefficient is to proceed to spectrum signature employed description fundamental frequency of when compensation to change the factor to the MFCC feature affects, can adjust by EM algorithm repeatedly and obtain best penalty coefficient.
4, the method for distinguishing speek person based on MFCC linear emotion compensation according to claim 1, it is characterized in that: described repeatedly EM algorithm determines that the The optimal compensation coefficient method is to conceal probability estimate by the MFCC after the compensation of different penalty coefficients, finds out the penalty coefficient that the penalty coefficient that wherein makes the probable value maximum uses during as training pattern.
5, the method for distinguishing speek person based on MFCC linear emotion compensation according to claim 1, it is characterized in that: the feature extraction on the speech frame comprises fundamental frequency, i.e. pitch and Mel cepstrum coefficient, the i.e. extraction of MFCC;
1), fundamental frequency:
A), the hunting zone f of fundamental frequency is set
Floor=50, f
Ceiling=1250Hz;
B), the span f of the fundamental frequency of voice is set
Min=50, f
Max=550Hz;
C), be fast fourier transform FFT, time-domain signal s (n) is become frequency domain signal X (k);
D), calculate the SHR of each frequency, i.e. subharmonic-harmonic wave ratio
SHR=SS/SH
Wherein
E), find out the highest frequency f of SHR
1
F) if f
1>f
MaxPerhaps f
1SS-SH<0, think non-voice or quiet frame so, fundamental frequency Pitch=0
G), at [1.9375f
1, 2.0625f
1] the interval seek the frequency f of the local maximum of SHR
2
H) if f
2>f
Max, perhaps f
2SHR>0.2, Pitch=f
1
I), other situations, Pitch=f
2
J), the fundamental frequency that obtains is carried out the auto-correlation effect:
From the mid point of frame, the long sampled point of 1/pitch is respectively got in front and back, calculates their autocorrelation value C, if C<0.2 thinks that so the fundamental frequency value is unreliable, Pitch=0;
K), at last whole Pitch values is carried out median smoothing filtering;
2), the extraction of MFCC:
A), the exponent number p of Mel cepstrum coefficient is set;
B), be fast fourier transform FFT, time-domain signal s (n) is become frequency domain signal X (k);
C), calculate Mel territory scale:
D), calculate corresponding frequency domain scale:
E), calculate each Mel territory passage φ
jOn the logarithm energy spectrum:
Wherein
Wherein
F), be discrete cosine transform DCT.
6, according to claim 1 or 2 or 3 or 4 described method for distinguishing speek person, it is characterized in that: determine the The optimal compensation coefficient by the EM algorithm, corresponding different penalty coefficient α based on MFCC linear emotion compensation
kCarry out the repeatedly probability calculation of latent state, to obtain the The optimal compensation coefficient;
A), to compensate coefficient be α to the cepstrum feature of corresponding frame
kLinear compensation
X (t) is a t cepstrum feature constantly, and Y (t) is a t fundamental frequency constantly, X
Opt(t) be the cepstrum feature of this moment after over-compensation, E (Y (t)) is the average pitch frequency:
B), estimate latent state probability
Wherein
C), circulation is calculated until finding
Satisfy
D), estimate parameter P ', the μ of GMM with local maximal criterion
i' and R
i', i.e. λ ';
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100613603A CN100440315C (en) | 2005-10-31 | 2005-10-31 | Speaker recognition method based on MFCC linear emotion compensation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100613603A CN100440315C (en) | 2005-10-31 | 2005-10-31 | Speaker recognition method based on MFCC linear emotion compensation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1758332A true CN1758332A (en) | 2006-04-12 |
CN100440315C CN100440315C (en) | 2008-12-03 |
Family
ID=36703669
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100613603A Expired - Fee Related CN100440315C (en) | 2005-10-31 | 2005-10-31 | Speaker recognition method based on MFCC linear emotion compensation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100440315C (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102201237A (en) * | 2011-05-12 | 2011-09-28 | 浙江大学 | Emotional speaker identification method based on reliability detection of fuzzy support vector machine |
CN1975856B (en) * | 2006-10-30 | 2011-11-09 | 邹采荣 | Speech emotion identifying method based on supporting vector machine |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN102354496A (en) * | 2011-07-01 | 2012-02-15 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
WO2013040981A1 (en) * | 2011-09-23 | 2013-03-28 | 浙江大学 | Speaker recognition method for combining emotion model based on near neighbour principles |
CN101547261B (en) * | 2008-03-27 | 2013-06-05 | 富士通株式会社 | Association apparatus and association method |
CN103594091A (en) * | 2013-11-15 | 2014-02-19 | 深圳市中兴移动通信有限公司 | Mobile terminal and voice signal processing method thereof |
CN105679321A (en) * | 2016-01-29 | 2016-06-15 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and device and terminal |
CN106297823A (en) * | 2016-08-22 | 2017-01-04 | 东南大学 | A kind of speech emotional feature selection approach based on Standard of Environmental Noiseization conversion |
CN103943104B (en) * | 2014-04-15 | 2017-03-01 | 海信集团有限公司 | A kind of voice messaging knows method for distinguishing and terminal unit |
CN109346087A (en) * | 2018-09-17 | 2019-02-15 | 平安科技(深圳)有限公司 | Fight the method for identifying speaker and device of the noise robustness of the bottleneck characteristic of network |
CN109564759A (en) * | 2016-08-03 | 2019-04-02 | 思睿逻辑国际半导体有限公司 | Speaker Identification |
CN110931022A (en) * | 2019-11-19 | 2020-03-27 | 天津大学 | Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics |
CN111462759A (en) * | 2020-04-01 | 2020-07-28 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
CN111681664A (en) * | 2020-07-24 | 2020-09-18 | 北京百瑞互联技术有限公司 | Method, system, storage medium and equipment for reducing audio coding rate |
CN113409762A (en) * | 2021-06-30 | 2021-09-17 | 平安科技(深圳)有限公司 | Emotional voice synthesis method, device, equipment and storage medium |
CN113567969A (en) * | 2021-09-23 | 2021-10-29 | 江苏禹治流域管理技术研究院有限公司 | Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE60213195T8 (en) * | 2002-02-13 | 2007-10-04 | Sony Deutschland Gmbh | Method, system and computer program for speech / speaker recognition using an emotion state change for the unsupervised adaptation of the recognition method |
-
2005
- 2005-10-31 CN CNB2005100613603A patent/CN100440315C/en not_active Expired - Fee Related
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1975856B (en) * | 2006-10-30 | 2011-11-09 | 邹采荣 | Speech emotion identifying method based on supporting vector machine |
CN101547261B (en) * | 2008-03-27 | 2013-06-05 | 富士通株式会社 | Association apparatus and association method |
CN102201237A (en) * | 2011-05-12 | 2011-09-28 | 浙江大学 | Emotional speaker identification method based on reliability detection of fuzzy support vector machine |
CN102201237B (en) * | 2011-05-12 | 2013-03-13 | 浙江大学 | Emotional speaker identification method based on reliability detection of fuzzy support vector machine |
CN102354496A (en) * | 2011-07-01 | 2012-02-15 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
CN102354496B (en) * | 2011-07-01 | 2013-08-21 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
WO2013040981A1 (en) * | 2011-09-23 | 2013-03-28 | 浙江大学 | Speaker recognition method for combining emotion model based on near neighbour principles |
CN103594091B (en) * | 2013-11-15 | 2017-06-30 | 努比亚技术有限公司 | A kind of mobile terminal and its audio signal processing method |
CN103594091A (en) * | 2013-11-15 | 2014-02-19 | 深圳市中兴移动通信有限公司 | Mobile terminal and voice signal processing method thereof |
CN103943104B (en) * | 2014-04-15 | 2017-03-01 | 海信集团有限公司 | A kind of voice messaging knows method for distinguishing and terminal unit |
CN105679321A (en) * | 2016-01-29 | 2016-06-15 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and device and terminal |
CN109564759B (en) * | 2016-08-03 | 2023-06-09 | 思睿逻辑国际半导体有限公司 | Speaker identification |
CN109564759A (en) * | 2016-08-03 | 2019-04-02 | 思睿逻辑国际半导体有限公司 | Speaker Identification |
US11735191B2 (en) | 2016-08-03 | 2023-08-22 | Cirrus Logic, Inc. | Speaker recognition with assessment of audio frame contribution |
CN106297823A (en) * | 2016-08-22 | 2017-01-04 | 东南大学 | A kind of speech emotional feature selection approach based on Standard of Environmental Noiseization conversion |
CN109346087A (en) * | 2018-09-17 | 2019-02-15 | 平安科技(深圳)有限公司 | Fight the method for identifying speaker and device of the noise robustness of the bottleneck characteristic of network |
CN109346087B (en) * | 2018-09-17 | 2023-11-10 | 平安科技(深圳)有限公司 | Noise robust speaker verification method and apparatus against bottleneck characteristics of a network |
CN110931022A (en) * | 2019-11-19 | 2020-03-27 | 天津大学 | Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics |
CN110931022B (en) * | 2019-11-19 | 2023-09-15 | 天津大学 | Voiceprint recognition method based on high-low frequency dynamic and static characteristics |
CN111462759A (en) * | 2020-04-01 | 2020-07-28 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
CN111462759B (en) * | 2020-04-01 | 2024-02-13 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
CN111681664A (en) * | 2020-07-24 | 2020-09-18 | 北京百瑞互联技术有限公司 | Method, system, storage medium and equipment for reducing audio coding rate |
CN113409762A (en) * | 2021-06-30 | 2021-09-17 | 平安科技(深圳)有限公司 | Emotional voice synthesis method, device, equipment and storage medium |
CN113409762B (en) * | 2021-06-30 | 2024-05-07 | 平安科技(深圳)有限公司 | Emotion voice synthesis method, emotion voice synthesis device, emotion voice synthesis equipment and storage medium |
CN113567969A (en) * | 2021-09-23 | 2021-10-29 | 江苏禹治流域管理技术研究院有限公司 | Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals |
CN113567969B (en) * | 2021-09-23 | 2021-12-17 | 江苏禹治流域管理技术研究院有限公司 | Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals |
Also Published As
Publication number | Publication date |
---|---|
CN100440315C (en) | 2008-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1758332A (en) | Speaker recognition method based on MFCC linear emotion compensation | |
CN101178897B (en) | Speaking man recognizing method using base frequency envelope to eliminate emotion voice | |
Shanthi et al. | Review of feature extraction techniques in automatic speech recognition | |
CN101030369A (en) | Built-in speech discriminating method based on sub-word hidden Markov model | |
CN1787075A (en) | Method for distinguishing speek speek person by supporting vector machine model basedon inserted GMM core | |
CN110265063B (en) | Lie detection method based on fixed duration speech emotion recognition sequence analysis | |
Torres-Boza et al. | Hierarchical sparse coding framework for speech emotion recognition | |
CN1787076A (en) | Method for distinguishing speek person based on hybrid supporting vector machine | |
CN100543840C (en) | Method for distinguishing speek person based on emotion migration rule and voice correction | |
CN110827857A (en) | Speech emotion recognition method based on spectral features and ELM | |
Quan et al. | Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition | |
CN113111151A (en) | Cross-modal depression detection method based on intelligent voice question answering | |
Pao et al. | Detecting emotions in Mandarin speech | |
Kandali et al. | Vocal emotion recognition in five native languages of Assam using new wavelet features | |
Zheng et al. | An improved speech emotion recognition algorithm based on deep belief network | |
Meftah et al. | Emotional speech recognition: A multilingual perspective | |
Palo et al. | Emotion Analysis from Speech of Different Age Groups. | |
Lu et al. | Physiological feature extraction for text independent speaker identification using non-uniform subband processing | |
Rao et al. | Glottal excitation feature based gender identification system using ergodic HMM | |
Patil et al. | A review on emotional speech recognition: resources, features, and classifiers | |
Kexin et al. | Research on Emergency Parking Instruction Recognition Based on Speech Recognition and Speech Emotion Recognition | |
Hamiditabar et al. | Determining the severity of depression in speech based on combination of acoustic-space and score-space features | |
Tanprasert et al. | Comparative study of GMM, DTW, and ANN on Thai speaker identification system | |
Gogoi et al. | Cross-linguistic rhythm analysis of Mising and Assamese | |
Fernandez et al. | Exploiting vocal-source features to improve ASR accuracy for low-resource languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081203 Termination date: 20211031 |