CN101650945A - Method for recognizing speaker based on multivariate core logistic regression model - Google Patents

Method for recognizing speaker based on multivariate core logistic regression model Download PDF

Info

Publication number
CN101650945A
CN101650945A CN200910152591A CN200910152591A CN101650945A CN 101650945 A CN101650945 A CN 101650945A CN 200910152591 A CN200910152591 A CN 200910152591A CN 200910152591 A CN200910152591 A CN 200910152591A CN 101650945 A CN101650945 A CN 101650945A
Authority
CN
China
Prior art keywords
speaker
overbar
beta
model
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910152591A
Other languages
Chinese (zh)
Other versions
CN101650945B (en
Inventor
王万良
郑建炜
郑泽萍
韩姗姗
蒋一波
王震宇
王磊
陈胜勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN2009101525913A priority Critical patent/CN101650945B/en
Publication of CN101650945A publication Critical patent/CN101650945A/en
Application granted granted Critical
Publication of CN101650945B publication Critical patent/CN101650945B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method for recognizing a speaker based on a multivariate core logistic regression model, comprising the following steps: (A) extracting voice features of the speaker: collecting voice signals of the speaker to be recognized to pre-process, and then extracting mel cepstrum parameters; (B) constructing a speaker model: using a multivariate core logistic regression model asa speaker recognition model; (C) training the speaker recognition model: using the feature vectors extracted from the step A as input training samples, through a minimal sequence optimization algorithm, carrying out an iterative training to optimize the model parameters; (D) recognizing the speaker: extracting the feature vectors of the voice signals of the speaker to be recognized and inputting to the recognition model of the trained speaker, and giving out a posterior probability of each speaker by the multivariate core logistic regression model, wherein the highest probability value is a recognition result. The invention has high rate of recognition, simple model construction and good rapidity.

Description

Speaker identification's implementation method based on the multivariate core logistic regression model
Technical field
The present invention relates to signal Processing, machine learning and area of pattern recognition, especially a kind of speaker identification's implementation method.
Background technology
The speaker identification is meant by the speaker's voice signal in the finite aggregate is carried out analyzing and processing and feature extraction, recognizes the speaker automatically whether in words person's set of appointment, the affirmation speaker's who continues concrete identity.Speaker identification's ultimate principle is to set up the disaggregated model that can describe its personal characteristics for each speaker.Therefore, outstanding model construction is one of gordian technique of speaker identification.
Traditional speaker's identification model comprises generation property such as mixed Gauss model (GMM), hidden Markov model (HMM) model.Though these models can obtain good recognition efficiency, need a large amount of training samples to come the Optimization Model parameter in the model training stage, and also need many speech datas to characterize people's to be identified personal characteristics in the Model Identification application stage.
Through new patent searching statistics, the patent of existing many Speaker Identification aspect both at home and abroad; For example, based on the method for distinguishing speek person (200510061953.X) of the supporting vector machine model of embedded GMM nuclear, utilize the method for distinguishing speek person (200710157134.4) of base frequency envelope to eliminate emotion voice, based on the method for distinguishing speek person (200710157133.X) of neutrality and affection sound-groove model conversion, based on the method for distinguishing speek person (200510061954.4) of hybrid supporting vector machine, based on the emotional speaker recognition method (200810162450.5) of frequency spectrum translation, based on the method for distinguishing speek person (200810162449.2) of mixed t model; Based on method for distinguishing speek person (200510061360.3) of MFCC linear emotion compensation etc.
Summary of the invention
Lower for the discrimination that overcomes existing speaker identification's implementation method, model construction is complicated, slow-footed deficiency, the invention provides the speaker identification's implementation method based on the multivariate core logistic regression model that a kind of discrimination height, model construction are simple, have good rapidity.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of speaker identification's implementation method based on the multivariate core logistic regression model may further comprise the steps:
A), speaker's phonetic feature extracts: gather speaker's voice signal to be identified, carry out pre-service; Extract the Mel cepstrum parameter again, described Mel cepstrum parameter is 13 rank cepstrum parameters, removes wherein describing the more weak coefficient of zero order of speaker's personal characteristics, and remaining 12 dimensional feature vectors are as speaker identification's input vector;
B), speaker model makes up: adopt the multivariate core logistic regression model as speaker identification's model,
p ( c i = k | x ‾ ; β ) = exp ( β k T x ‾ + β k 0 ) Σ j = 1 K exp ( β j T x ‾ + β j 0 ) , k=1,2,…K
Wherein, K is for need distinguish speaker's number, and x is 12 dimensional feature vectors, β=[β 1 Tβ 2 Tβ K T], β ∈ R 12 * KBe the overall model parameter; β kBe k the corresponding model parameter of speaker, β K0Be k the corresponding model constant parameter of speaker, c iBe i the corresponding speaker's target of speech feature vector;
C), speaker identification's model training: the proper vector that steps A is extracted is carried out the iteration training as the input training sample by the minmal sequence optimized Algorithm, makes model parameter reach optimum;
D), speaker identification: extract and wait to distinguish the proper vector of speaker's voice signal, and import trained speaker identification's model, the multivariate core logistic regression model can provide each speaker's posterior probability, and probable value soprano is a recognition result.
Further, in step C, the cost functional of described minmal sequence optimized Algorithm is:
min D = C Σ i = 1 n Σ k = 1 K ( c ik - α ik C ) log ( c ik - α ik C ) + 1 2 Σ k = 1 K | | β ‾ k | | 2
s . t . Σ k = 1 K α ik = 0 , ∀ i ; Σ i = 1 n α ik = 0 , ∀ k
Wherein, β kBe k the corresponding model parameter of speaker, C is the constant penalty factor, α IkBe the coefficient vector that β launches at higher dimensional space, c Ik{ 1,0} is vectorial c to ∈ kIn respective index value, and c ‾ k = { 0,0 , · · · , 1 k , · · · , 0 K } , Represent k speaker's object vector;
Training step is as follows:
1) the given initialization α vector that satisfies condition, iteration Iter=1;
2), make H if exist different index to (i, i ') I, k≠ H I ', kThen select corresponding
upper ( k ) = arg max i H ik , K=1,2 ..., K-1 and
lower ( k ) = arg min i H ik , k=1,2,…,K-1;
Wherein:
H ik = Σ t = 1 n α tk K ( x ‾ t , x ‾ i ) + Σ t = 1 n ( Σ k ′ = 1 K - 1 α tk K ( x ‾ t , x ‾ i ) )
Figure G2009101525913D00034
(x is y) for satisfying the kernel function of Merser condition for K.
3) seek optimum change parameter t *, upgrade α Upper (k), k, α Lower (k), kAs follows:
α upper(k),k Iter+1=α upper(k),k Iter+t *
α lower(k),k Iter+1=α lower(k),k Iter-t *
α i,k Iter+1=α i,k Iter,for?other?i,k
4) with α Iter+1Recomputate H Ik, and select new upper (k) and lower (k);
5) if for different k ∈ 1,2 ..., K-1}, (i, i ') index value is to always satisfying H arbitrarily Ik=H I ' k, iteration stopping then, otherwise change step 2 over to) continue, till stop condition satisfies.
Further again, in described step D, described speaker identification's mode is:
arg max k ∈ { 1,2 , · · · K } ( p ( c i = k | x ‾ ; β ) )
For new phonetic entry vector x, get the highest k of posterior probability the artificial recognition result of speaking, wherein:
p ( c i = k | x ‾ ; β ) = exp ( β k T x ‾ + β k 0 ) Σ j = 1 K exp ( β j T x ‾ + β j 0 ) , k=1,2,…K。
Further, in described steps A, described pre-service comprises: sample quantization, center clipping, high boost and windowing divide frame.
Technical conceive of the present invention is: it is the effective property distinguished disaggregated model that nuclear logistic returns, and is mainly used in the posterior probability that generates in the discriminant classification, has been successfully applied to occasions such as the selection of gene pathology, the classification of credit card risk, isolated word identification.Nuclear logistic returns has natural posterior probability output, and the extendability good to multivariate classification, makes it can be applied to these many discriminant classifications occasions of speaker identification very naturally.Existing nuclear logistic returns and is applied to speaker identification's technology, only binary model is carried out simple application, though discrimination is better than classic algorithm, and the model construction complexity, and do not embody the advantage that logistic returns the diversification expanded application.
Multivariate core logistic returns speaker identification's method, be to make up a multivariate core logistic regression model for several speakers, when new unknown speech characteristic parameter was imported, this model can be exported each speaker's posterior probability, and wherein that people that probability is the highest is for distinguishing the result.At first traditional logistic regression model is carried out the diversification expansion, utilize the nuclear skill again, former linear model is converted into nonlinear model to adapt to speaker identification's occasion.In the model training stage, each speaker's training utterance data to be carried out pre-service and extract the input feature value of characteristic parameter as model, model parameter is upgraded by minmal sequence rapid optimizing algorithm iteration.At cognitive phase, speaker's statement to be identified is carried out identical pre-service, extract identical characteristic parameter, export each speaker's posterior probability by the multivariate core logistic regression model of training gained, to obtain to distinguish the result.
The technical solution adopted for the present invention to solve the technical problems can also be further perfect.Described multivariate core logistic regression model training algorithm is minmal sequence optimization, cost functional with former multivariate core logistic regression model is converted to dual form earlier, and derive optimal condition, in each iterative process, only two parameters are upgraded optimization, matrix inversion operation when avoiding a plurality of parameters to upgrade simultaneously makes model training speed faster.
The effect that the present invention is useful is: 1, adopt the multivariate core logistic regression model as speaker identification's model, discrimination is higher than tradition generation property model (as gauss hybrid models), to compare discrimination similar with other property distinguished models (as support vector machine), but support vector machine is the binary classification device, can only make up a plurality of models by " one-to-many " or " one to one " mode carries out the ballot formula and classifies more, and the multivariate core logistic regression model can directly carry out many classification, and model construction is directly perceived more quick; 2, multivariate core logistic regression model training process adopts the minmal sequence optimized Algorithm, makes training speed faster, meets the big training sample situation of this class of speaker identification.
Embodiment
Below the present invention is further described.
A kind of speaker identification's implementation method based on the multivariate core logistic regression model may further comprise the steps:
A), speaker's phonetic feature extracts: gather speaker's voice signal to be identified, carry out pre-service; Extract the Mel cepstrum parameter again, described Mel cepstrum parameter is 13 rank cepstrum parameters, removes wherein describing the more weak coefficient of zero order of speaker's personal characteristics, and remaining 12 dimensional feature vectors are as speaker identification's input vector;
B), speaker model makes up: adopt the multivariate core logistic regression model as speaker identification's model,
p ( c i = k | x ‾ ; β ) = exp ( β k T x ‾ + β k 0 ) Σ j = 1 K exp ( β j T x ‾ + β j 0 ) , k=1,2,…K
Wherein, K is for need distinguish speaker's number, and x is 12 dimensional feature vectors, β={ β 1, β 2..., β K} TBe the overall model parameter; β kBe k the corresponding model parameter of speaker, β K0Be k the corresponding model constant parameter of speaker, c iBe i the corresponding speaker's target of speech feature vector.
C), speaker identification's model training: the proper vector that steps A is extracted is carried out the iteration training as the input training sample by the minmal sequence optimized Algorithm, makes model parameter reach optimum;
D), speaker identification: extract and wait to distinguish the proper vector of speaker's voice signal, and import trained speaker identification's model, the multivariate core logistic regression model can provide each speaker's posterior probability, and probable value soprano is a recognition result.
The concrete framework of present embodiment is:
First's feature extraction
Prior art is adopted in feature extraction, and the voice signal of at first gathering each speaker's different times is some, carries out pretreatment operation, comprises that sample quantization, center clipping, pre-emphasis, low segment are removed, windowing divides frame.Pretreated voice signal is carried out feature extraction, the present invention adopts Mel frequency cepstrum parameter (Mel FrequencyCepstral Coefficient, MFCC), extract 13 rank Mel cepstrum parameters of every frame voice signal, remove and wherein speaker characteristic is described the 0th less rank parameter, last every frame voice signal is converted to 12 Jan Vermeer cepstrum feature vectors.
The second portion multivariate core logistic returns speaker identification's model
During the speaker identification used, setting and needing the number of speaking of identification was K, and after first's characteristic parameter extraction, given training sample set is { x 1, c 1, { x 2, c 2... { x n, c n, wherein import x iBe p dimension speaker characteristic vector, i.e. x i=(x I1, x I2... x Ip) T, p=12 among the present invention, output sample label c iBe finite set 1,2 ... one of K}, setting β is a model parameter.
Classical logistic regression model is the binary classification device, i.e. K=2, and the definition linear discriminant function is:
g ( x ‾ i ) = log it { p ( x ‾ i , β ) } = log p ( c i = 1 ) p ( c i = 0 ) = β T x ‾ i + β 0
Then can get the posterior probability that sample is under the jurisdiction of classification 1:
p ( c i = 1 | x ‾ ) = exp ( g ( x ‾ ) ) 1 + exp ( g ( x ‾ ) )
The Logistic regression problem is exactly linear function g (x i)=β Tx iThe parameter beta optimization problem.Suppose sample object label c i∈ 1,0} obeys Bernoulli according to input sample set X and distributes, then the sample likelihood score:
l ( β ) = Π i = 1 n ( p ( c i = 1 | x ‾ i ) c ij ( 1 - p ( c i = 1 | x ‾ i ) ) ) 1 - c ij
It is got negative logarithm, obtains the original cost functional that minimizes of linear logistic regression model:
L ( β ) = - Σ i = 1 n [ c i ( β T x ‾ i + β 0 ) - log ( 1 + exp ( β T x ‾ i + β 0 ) ) ]
Linear indivisibility at speaker identification's input characteristic parameter must expand to nonlinear regression model (NLRM) with linear regression model (LRM), returns thereby obtain examining logistic.By Nonlinear Mapping Φ: R p→ F is mapped to high-dimensional feature space with the former input space.In the F space, β can be expressed as:
β = Σ i = 1 n α i Φ ( x ‾ i )
In higher dimensional space, make up g ' (x i)=β TΦ (x i)+β 0, that is:
g ′ ( x ‾ ) = β T Φ ( x ‾ ) + β 0 = ( Σ i = 1 n α i Φ ( x ‾ i ) ) Φ ( x ‾ ) + β 0 = Σ i = 1 n α i K ( x ‾ i , x ‾ ) + β 0
Like this for the former input space, g ' (x) just becomes a nonlinear function, wherein K (x, y) for satisfying the kernel function of Mercer condition, the radially basic kernel function that the present invention is most widely used:
K ( x , y ) = exp ( - | | x - y | | 2 σ )
Utilize the nuclear skill, the posterior probability of coring p ( c i = 1 | x ‾ ) = 1 1 + exp ( - Σ i = 1 n α i K ( x ‾ i , x ‾ ) - β 0 ) , And the cost functional that nuclear logistic returns is:
min L ( α ) = - Σ i = 1 n Σ j = 1 n [ c i α j K ( x ‾ i , x ‾ j ) + β 0 ] + Σ i = 1 n log ( 1 + exp [ α j K ( x ‾ i , x ‾ j ) + β 0 ]
Knowledge is debated if directly use binary nuclear logistic model in general K>2 in the actual speaker identification system, then needs to adopt " one-to-many " or " one to one " method to make up a plurality of sorters, has increased the loaded down with trivial details degree of model construction.True coker logistic returns and can carry out the multivariate classification expansion very naturally:
p ( c i = k | x ‾ ; β ) = exp ( β k T x ‾ + β k 0 ) Σ j = 1 K exp ( β j T x ‾ + β j 0 ) , k=1,2,…K
β=[β wherein 1 Tβ 2 Tβ K T], β ∈ R P * KBe the most optimized parameter, i.e. the model parameter that need optimize of speaker identification system.The derivation of optimized parameter is still by minimizing negative log-likelihood degree function:
min β l ( β ) = - log ( Π i = 1 n p ( c i = k | x ‾ i ; β )
= Σ k = 1 K Σ c i = k [ - β k T x ‾ i - β k 0 + log ( Σ j = 1 K exp ( β j T x ‾ + β j 0 ) ]
For strengthening the generalization ability of model, to optimum functional stack L 2The regularization part.Final polynary logistic regression model cost functional:
min H = Σ i = 1 n [ - c ‾ i T β k T x i - β k 0 + log ( Σ j = 1 K exp ( β j T x ‾ + β j 0 ) ) ] + λ 2 Σ k = 1 K | | β k | | 2
C wherein iBe the K dimensional vector, work as c iDuring=k, c i=(0,0 ... 1 ... 0), 1 is the k dimension.Equally, use the nuclear skill, corresponding multivariate core logistic regression model cost functional:
min H ′ = Σ i = 1 n Σ j = 1 n [ - c ‾ i T α jk K ( x ‾ i , x ‾ j ) - β k 0 ] + Σ j = 1 n log ( Σ j = 1 K α mj K ( x ‾ m , x ‾ i ) + β j 0 )
+ λ 2 Σ k = 1 K Σ i , i ′ α ik α i ′ k K ( x ‾ i , x ‾ j ′ )
Third part model training algorithm
The training algorithm of nuclear logistic regression model is a lot, the heavy weighted least squares (IRRLS) of iteration, newton-La Feixunfa, trust region Newton method (TRNM) etc. are arranged, in the occasion such as speaker identification of all these methods and many class objects numerous in training sample quantity, each iterative process has the matrix inversion operation of suitable calculated amount.The present invention carries out dualization with protokaryon logistic regression model cost functional, proposes minmal sequence and optimizes training algorithm, only optimizes two side reaction coefficients in each iterative process, has avoided time-consuming matrix manipulation.
1) cost functional dualization
Former cost functional formula is equal to: min H ′ = C Σ i = 1 n Σ k = 1 K g ( ξ ik ) + 1 2 Σ k = 1 K | | β ‾ k | | 2 , Wherein C=1/ λ is the regularization constant, and ξ ik = β k 0 + β ‾ k T x ‾ i , g ( ξ ik ) = - c ik ξ ik + log ( e ξ i 1 + e ξ i 2 + · · · + e ξ iK ) , Be converted into Lagrangian Form:
Figure G2009101525913D00086
+ Σ i = 1 n Σ k = 1 K α ik ( ξ ik - β k 0 - β ‾ k T x ‾ i ) + α 0 Σ k = 1 K β k 0
α wherein Ik, α 0Be Lagrange multiplier, then the KKT condition is:
Figure G2009101525913D00088
Figure G2009101525913D00089
Figure G2009101525913D00091
Can derive by top three formulas β ‾ k = Σ i = 1 n α ik x ‾ i , ∀ k , Σ k = 1 K α ik = 0 , α 0 = 0 , Σ i = 1 n α ik = 0 , ∀ k , And
ξ ik = log ( c ik - α ik C ) - 1 K Σ k ′ = 1 K log ( c ik ′ - α ik ′ C ) , ∀ i , k
g ′ ( ξ ik ) = - α ik C
If G (δ)=δ ξ Ik-g (ξ Ik), wherein δ = - α ik C , G is carried out differential can be got:
∂ G ∂ δ = ξ ik + δ d ξ ik dδ - g ′ ( ξ ik ) d ξ ik dδ = ξ ik
So G can be by integration by ξ IkGained:
G ( - α ik C ) = K - 1 K ( c ik - α ik C ) log ( c ik - α ik C ) + const .
Wherein const is a constant, and G is the part of cost functional, uses the Wolfe duality theory, through simplifying the dual form that can get cost functional is:
min D = C Σ i = 1 n Σ k = 1 K G ( - α ik C ) + 1 2 Σ k = 1 K | | β ‾ k | | 2
s . t . Σ k = 1 K α ik = 0 , ∀ i ; Σ i = 1 n α ik = 0 , ∀ k
2) optimal condition
Following formula contains two constraint conditions, use minmal sequence optimization and carry out minimized target, and is at first that one of them constraint condition is integrated in cost functional:
min D ~ = C Σ i = 1 n Σ k = 1 K G ( - α ik C ) + 1 2 Σ k = 1 K - 1 | | β ‾ k | | 2 + 1 2 | | - Σ i = 1 n ( Σ k ′ = 1 K - 1 α ik ) x ‾ i | | 2
s . t . Σ i = 1 n α ik = 0 , ∀ k
Then the Lagrangian Form of antithesis cost functional is:
Figure G2009101525913D00101
The optimal condition of minmal sequence optimization promptly is α in the following formula dual function IkThe change stop condition of parameter will
Figure G2009101525913D00102
To α IkCarrying out differential obtains:
Figure G2009101525913D00103
α wherein IkSatisfy condition:
0 < &alpha; ik < C , c ik = 1 - C < &alpha; ik < 0 , c ik = 0 And 0 < &Sigma; i = 1 n ( c ik - &alpha; ik C ) < 1
If:
H ik = &Sigma; t = 1 n &alpha; tk K ( x &OverBar; t , x &OverBar; i ) + &Sigma; t = 1 n ( &Sigma; k &prime; = 1 K - 1 &alpha; tk &prime; K ( x &OverBar; t , x &OverBar; i ) )
- [ log ( c ik - &alpha; ik C ) + log ( 1 - &Sigma; k = 1 K - 1 ( c ik - &alpha; ik C ) ) ]
upper ( k ) = arg max i H ik , k=1,2,…,K-1
lower ( k ) = arg min i H ik , k=1,2,…,K-1
The optimal condition of multivariate core logistic regression model antithesis training then:
H upper(k),k=H lower(k),k=β k,k=1,2,…,K-1
3) the minmal sequence training is optimized
Based on the antithesis cost functional and the optimal condition thereof of above derivation, the basic operation that multivariate core logistic model minmal sequence is optimized training method comprises α in the correct initialization of alpha parameter and the each iteration Upper (k)With α Lower (k)Renewal, the specific algorithm flow process is as follows:
(1) the given initialization α vector that satisfies condition, iteration Iter=1;
(2), make H if exist different index to (i, i ') I, k≠ H I ', k, then select corresponding upper (k) and lower (k);
(3) upgrade α Upper (k), k, α Lower (k), kAs follows
α upper(k),k Iter+1=α upper(k),k Iter+t *
α lower(k),k Iter+1=α lower(k),k Iter-t *
α i,k Iter+1=α i,k Iter,for?other?i,k
(4) with α Iter+1Recomputate H Ik, and select new upper (k) and lower (k);
(5) if for different k ∈ 1,2 ..., K-1}, (i, i ') index value is to always satisfying H arbitrarily Ik=H I ' k, then iteration stopping continues otherwise change step (2) over to, till the stop condition formula satisfies.
The 4th part speaker debates not
After multivariate core logistic recurrence speaker model made up and finishes, for new input vector x, classification results was:
arg max k &Element; { 1,2 , &CenterDot; &CenterDot; &CenterDot; K } ( p ( c i = k | x &OverBar; ; &beta; ) )
That is, get the highest k of posterior probability the artificial recognition result of speaking, wherein:
p ( c i = k | x &OverBar; ; &beta; ) = exp ( &beta; k T x &OverBar; + &beta; k 0 ) &Sigma; j = 1 K exp ( &beta; j T x &OverBar; + &beta; j 0 ) , k=1,2,…K
Measure of merit: the corpus of oneself recording is adopted in experiment, 20 of recording total numbers of persons, and wherein the man is 12,8 of woman.Data transform acquisition by sample frequency 8000Hz, quantization digit 16bit, monophony A/D.Everyone voice signal is recorded synthetic by different times.Everyone mix extract different times sound bite total length 15s as training signal, the sound bite that 20 length of different times are 1.5s is as test signal, i.e. 20 training utterances, 400 tested speech.Voice signal is earlier through high boost, pre-service such as center reduction, detect by VAD (Voice Activity Detection) sound is active again, extract wherein effective voice segments, remove redundant unvoiced segments, is the MFCC characteristic parameters that length divides frame to extract 12 dimensions with 30ms, i.e. mel frequency cepstral coefficient is as sorting parameter.
Multivariate core logistic homing method and gauss hybrid models method and support vector machine method are carried out the contrast of speaker identification's discrimination, wherein the degree of mixing of gauss hybrid models gets 100, support vector machine is the binary classification device, adopts the method that " one to one " a plurality of disaggregated models carry out the identification of ballot formula that makes up.The multivariate core logistic homing method is taked identical radially basic kernel function, σ value 1.5 with support vector machine method.Discrimination is as a result: multivariate core logistic returns: 97.5%; Support vector machine: 97%; Gauss hybrid models: 96.5%.As seen, the excellent and classical speaker identification's method of speaker identification's discrimination of the inventive method.

Claims (4)

1, a kind of speaker identification's implementation method based on the multivariate core logistic regression model, it is characterized in that: described speaker identification's implementation method may further comprise the steps:
A), speaker's phonetic feature extracts: gather speaker's voice signal to be identified, carry out pre-service; Extract the Mel cepstrum parameter again, described Mel cepstrum parameter is 13 rank cepstrum parameters, removes wherein describing the more weak coefficient of zero order of speaker's personal characteristics, and remaining 12 dimensional feature vectors are as speaker identification's input vector;
B), speaker model makes up: adopt the multivariate core logistic regression model as speaker identification's model,
p ( c i = k | x &OverBar; ; &beta; ) = exp ( &beta; k T x &OverBar; + &beta; k 0 ) &Sigma; j = 1 K exp ( &beta; j T x &OverBar; + &beta; j 0 ) , k=1,2,…K
Wherein, K is for need distinguish speaker's number, and x is 12 dimension input feature values, β={ β 1, β 2..., β K} TBe the overall model parameter; β kBe k the corresponding model parameter of speaker, β K0Be k the corresponding model constant parameter of speaker, c iBe i the corresponding speaker's target of speech feature vector;
C), speaker identification's model training: the proper vector that steps A is extracted is carried out the iteration training as the input training sample by the minmal sequence optimized Algorithm, makes model parameter reach optimum;
D), speaker identification: extract and wait to distinguish the proper vector of speaker's voice signal, and import trained speaker identification's model, the multivariate core logistic regression model can provide each speaker's posterior probability, and probable value soprano is a recognition result.
2, the speaker identification's implementation method based on the multivariate core logistic regression model as claimed in claim 1, it is characterized in that: in step C, the cost functional of described minmal sequence optimized Algorithm is:
min D = C &Sigma; i = 1 n &Sigma; k = 1 K ( c ik - &alpha; ik C ) log ( c ik - &alpha; ik C ) + 1 2 &Sigma; k = 1 K | | &beta; k | | 2
s . t . &Sigma; k = 1 K &alpha; ik = 0 , &ForAll; i ; &Sigma; i = 1 n &alpha; ik = 0 , &ForAll; k
Wherein, β kBe k the corresponding model parameter of speaker, C is the constant penalty factor, α IkBe the coefficient vector that β launches at higher dimensional space, c Ik{ 1,0} is K dimensional vector c to ∈ iIn respective index value, c &OverBar; i = { 0,0 , . . . , 1 k , . . . , 0 K } , 1 is the k dimension, represents k speaker's object vector;
Training step is as follows:
1) the given initialization α vector that satisfies condition, iteration Iter=1;
2), make H if exist different index to (i, i ') I, k≠ H I ', k, then select corresponding upper ( k ) = arg max i H ik , K=1,2 ..., K-1 and lower ( k ) = arg min i H ik , k=1,2,…,K-1;
Wherein:
H ik = &Sigma; t = 1 n &alpha; tk K ( x &OverBar; t , x &OverBar; i ) + &Sigma; t = 1 n ( &Sigma; k &prime; = 1 K - 1 &alpha; tk &prime; K ( x &OverBar; t , x &OverBar; i ) )
- [ log ( c ik - &alpha; ik C ) + log ( 1 - &Sigma; k = 1 K - 1 ( c ik - &alpha; ik C ) ) ]
(x is y) for satisfying the kernel function of Merser condition for K.
3) seek optimum change parameter t *, upgrade α Upper (k), k, α Lower (k), kAs follows:
α upper(k),k Iter+1=α upper(k),k Iter+t *
α lower(k),k Iter+1=α lower(k),k Iter-t *
α i,k Iter+1=α i,k Iter,for?other?i,k
4) with α Iter+1Recomputate H Ik, and select new upper (k) and lower (k);
5) if for different k ∈ 1,2 ..., K-1}, (i, i ') index value is to always satisfying H arbitrarily Ik=H I ' k, iteration stopping then, otherwise change step 2 over to) continue, till stop condition satisfies.
3, the speaker identification's implementation method based on the multivariate core logistic regression model as claimed in claim 1 or 2 is characterized in that: in described step D, described speaker identification's mode is:
arg max k &Element; { 1,2 , . . . K } ( p ( c i = k | x &OverBar; ; &beta; ) )
For new phonetic entry vector x, get the highest k of posterior probability the artificial recognition result of speaking, wherein:
p ( c i = k | x &OverBar; ; &beta; ) = exp ( &beta; k T x &OverBar; + &beta; k 0 ) &Sigma; j = 1 K exp ( &beta; j T x &OverBar; + &beta; j 0 ) , k = 1,2 , . . . K .
4, the speaker identification's implementation method based on the multivariate core logistic regression model as claimed in claim 3, it is characterized in that: in described steps A, described pre-service comprises: sample quantization, center clipping, high boost and windowing divide frame.
CN2009101525913A 2009-09-17 2009-09-17 Method for recognizing speaker based on multivariate core logistic regression model Expired - Fee Related CN101650945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101525913A CN101650945B (en) 2009-09-17 2009-09-17 Method for recognizing speaker based on multivariate core logistic regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101525913A CN101650945B (en) 2009-09-17 2009-09-17 Method for recognizing speaker based on multivariate core logistic regression model

Publications (2)

Publication Number Publication Date
CN101650945A true CN101650945A (en) 2010-02-17
CN101650945B CN101650945B (en) 2011-11-23

Family

ID=41673166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101525913A Expired - Fee Related CN101650945B (en) 2009-09-17 2009-09-17 Method for recognizing speaker based on multivariate core logistic regression model

Country Status (1)

Country Link
CN (1) CN101650945B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102789594A (en) * 2012-06-28 2012-11-21 南京邮电大学 Voice generation method based on DIVA neural network model
CN103020046A (en) * 2012-12-24 2013-04-03 哈尔滨工业大学 Name transliteration method on the basis of classification of name origin
CN105787770A (en) * 2016-04-27 2016-07-20 上海遥薇(集团)有限公司 Non-negative matrix factorization (NMF) algorithm-based big data commodity and service recommending method and system
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN110827986A (en) * 2019-11-11 2020-02-21 科大讯飞股份有限公司 Method, device and equipment for screening developmental reading disorder and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102543073B (en) * 2010-12-10 2014-05-14 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102789594A (en) * 2012-06-28 2012-11-21 南京邮电大学 Voice generation method based on DIVA neural network model
CN102789594B (en) * 2012-06-28 2014-08-13 南京邮电大学 Voice generation method based on DIVA neural network model
CN103020046A (en) * 2012-12-24 2013-04-03 哈尔滨工业大学 Name transliteration method on the basis of classification of name origin
WO2014101629A1 (en) * 2012-12-24 2014-07-03 哈尔滨工业大学 Name transliteration method based on classification of name origins
CN103020046B (en) * 2012-12-24 2016-04-20 哈尔滨工业大学 Based on the name transliteration method of name origin classification
CN105787770A (en) * 2016-04-27 2016-07-20 上海遥薇(集团)有限公司 Non-negative matrix factorization (NMF) algorithm-based big data commodity and service recommending method and system
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN110827986A (en) * 2019-11-11 2020-02-21 科大讯飞股份有限公司 Method, device and equipment for screening developmental reading disorder and storage medium

Also Published As

Publication number Publication date
CN101650945B (en) 2011-11-23

Similar Documents

Publication Publication Date Title
CN106228977B (en) Multi-mode fusion song emotion recognition method based on deep learning
Carlin et al. Rapid evaluation of speech representations for spoken term discovery
CN101650945B (en) Method for recognizing speaker based on multivariate core logistic regression model
CN110289003A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN102568476B (en) Voice conversion method based on self-organizing feature map network cluster and radial basis network
CN102982803A (en) Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN111696522B (en) Tibetan language voice recognition method based on HMM and DNN
Pham et al. Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
van der Westhuizen et al. Feature learning for efficient ASR-free keyword spotting in low-resource languages
CN106531192A (en) Speech emotion recognition method and system based on redundancy features and multi-dictionary representation
Sen et al. A convolutional neural network based approach to recognize bangla spoken digits from speech signal
Tripathi et al. Improvement of phone recognition accuracy using speech mode classification
Ayache et al. Speech command recognition using deep learning
Jiang et al. Task-aware deep bottleneck features for spoken language identification.
CN1741131B (en) Method and apparatus for identifying non-particular person isolating word voice
Zhang et al. Emotion recognition in speech using multi-classification SVM
Barman et al. State of the art review of speech recognition using genetic algorithm
Zailan et al. Comparative analysis of LPC and MFCC for male speaker recognition in text-independent context
Williams Learning disentangled speech representations
Ma et al. Language identification with deep bottleneck features
Rao et al. Robust features for automatic text-independent speaker recognition using Gaussian mixture model
Majidnezhad A HTK-based method for detecting vocal fold pathology
Jalalvand et al. A classifier combination approach for Farsi accents recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111123

Termination date: 20210917