CN101650945A

CN101650945A - Method for recognizing speaker based on multivariate core logistic regression model

Info

Publication number: CN101650945A
Application number: CN200910152591A
Authority: CN
Inventors: 王万良; 郑建炜; 郑泽萍; 韩姗姗; 蒋一波; 王震宇; 王磊; 陈胜勇
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2009-09-17
Filing date: 2009-09-17
Publication date: 2010-02-17
Anticipated expiration: 2029-09-17
Also published as: CN101650945B

Abstract

The invention discloses a method for recognizing a speaker based on a multivariate core logistic regression model, comprising the following steps: (A) extracting voice features of the speaker: collecting voice signals of the speaker to be recognized to pre-process, and then extracting mel cepstrum parameters; (B) constructing a speaker model: using a multivariate core logistic regression model asa speaker recognition model; (C) training the speaker recognition model: using the feature vectors extracted from the step A as input training samples, through a minimal sequence optimization algorithm, carrying out an iterative training to optimize the model parameters; (D) recognizing the speaker: extracting the feature vectors of the voice signals of the speaker to be recognized and inputting to the recognition model of the trained speaker, and giving out a posterior probability of each speaker by the multivariate core logistic regression model, wherein the highest probability value is a recognition result. The invention has high rate of recognition, simple model construction and good rapidity.

Description

Speaker identification's implementation method based on the multivariate core logistic regression model

Technical field

The present invention relates to signal Processing, machine learning and area of pattern recognition, especially a kind of speaker identification's implementation method.

Background technology

The speaker identification is meant by the speaker's voice signal in the finite aggregate is carried out analyzing and processing and feature extraction, recognizes the speaker automatically whether in words person's set of appointment, the affirmation speaker's who continues concrete identity.Speaker identification's ultimate principle is to set up the disaggregated model that can describe its personal characteristics for each speaker.Therefore, outstanding model construction is one of gordian technique of speaker identification.

Traditional speaker's identification model comprises generation property such as mixed Gauss model (GMM), hidden Markov model (HMM) model.Though these models can obtain good recognition efficiency, need a large amount of training samples to come the Optimization Model parameter in the model training stage, and also need many speech datas to characterize people's to be identified personal characteristics in the Model Identification application stage.

Through new patent searching statistics, the patent of existing many Speaker Identification aspect both at home and abroad; For example, based on the method for distinguishing speek person (200510061953.X) of the supporting vector machine model of embedded GMM nuclear, utilize the method for distinguishing speek person (200710157134.4) of base frequency envelope to eliminate emotion voice, based on the method for distinguishing speek person (200710157133.X) of neutrality and affection sound-groove model conversion, based on the method for distinguishing speek person (200510061954.4) of hybrid supporting vector machine, based on the emotional speaker recognition method (200810162450.5) of frequency spectrum translation, based on the method for distinguishing speek person (200810162449.2) of mixed t model; Based on method for distinguishing speek person (200510061360.3) of MFCC linear emotion compensation etc.

Summary of the invention

Lower for the discrimination that overcomes existing speaker identification's implementation method, model construction is complicated, slow-footed deficiency, the invention provides the speaker identification's implementation method based on the multivariate core logistic regression model that a kind of discrimination height, model construction are simple, have good rapidity.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of speaker identification's implementation method based on the multivariate core logistic regression model may further comprise the steps:

A), speaker's phonetic feature extracts: gather speaker's voice signal to be identified, carry out pre-service; Extract the Mel cepstrum parameter again, described Mel cepstrum parameter is 13 rank cepstrum parameters, removes wherein describing the more weak coefficient of zero order of speaker's personal characteristics, and remaining 12 dimensional feature vectors are as speaker identification's input vector;

B), speaker model makes up: adopt the multivariate core logistic regression model as speaker identification's model,

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})},

k＝1，2，…K

Wherein, K is for need distinguish speaker's number, and x is 12 dimensional feature vectors, β=[β ₁ ^Tβ ₂ ^Tβ _K ^T], β ∈ R ^{12 * K}Be the overall model parameter; β _kBe k the corresponding model parameter of speaker, β _K0Be k the corresponding model constant parameter of speaker, c _iBe i the corresponding speaker's target of speech feature vector;

C), speaker identification's model training: the proper vector that steps A is extracted is carried out the iteration training as the input training sample by the minmal sequence optimized Algorithm, makes model parameter reach optimum;

D), speaker identification: extract and wait to distinguish the proper vector of speaker's voice signal, and import trained speaker identification's model, the multivariate core logistic regression model can provide each speaker's posterior probability, and probable value soprano is a recognition result.

Further, in step C, the cost functional of described minmal sequence optimized Algorithm is:

\min D = C Σ_{i = 1}^{n} Σ_{k = 1}^{K} (c_{ik} - \frac{α_{ik}}{C}) \log (c_{ik} - \frac{α_{ik}}{C}) + \frac{1}{2} Σ_{k = 1}^{K} {| | {\overset{&OverBar;}{β}}_{k} | |}^{2}

s . t . Σ_{k = 1}^{K} α_{ik} = 0, &ForAll; i; Σ_{i = 1}^{n} α_{ik} = 0, &ForAll; k

Wherein, β _kBe k the corresponding model parameter of speaker, C is the constant penalty factor, α _IkBe the coefficient vector that β launches at higher dimensional space, c _Ik{ 1,0} is vectorial c to ∈ _kIn respective index value, and

{\overset{&OverBar;}{c}}_{k} = {0,0, \cdot \cdot \cdot, \underset{k}{1}, \cdot \cdot \cdot, \underset{K}{0}},

Represent k speaker's object vector;

Training step is as follows:

1) the given initialization α vector that satisfies condition, iteration Iter=1;

2), make H if exist different index to (i, i ') _{I, k}≠ H _{I ', k}Then select corresponding

upper (k) = \underset{i}{\arg \max} H_{ik},

K=1,2 ..., K-1 and

lower (k) = \underset{i}{\arg \min} H_{ik},

k＝1，2，…，K-1；

Wherein:

H_{ik} = Σ_{t = 1}^{n} α_{tk} K ({\overset{&OverBar;}{x}}_{t}, {\overset{&OverBar;}{x}}_{i}) + Σ_{t = 1}^{n} (Σ_{k^{'} = 1}^{K - 1} α_{tk} K ({\overset{&OverBar;}{x}}_{t}, {\overset{&OverBar;}{x}}_{i}))

(x is y) for satisfying the kernel function of Merser condition for K.

3) seek optimum change parameter t ^*, upgrade α _{Upper (k), k}, α _{Lower (k), k}As follows:

α _upper(k)，k ^Iter+1＝α _upper(k)，k ^Iter+t ^*

α _lower(k)，k ^Iter+1＝α _lower(k)，k ^Iter-t ^*

α _i，k ^Iter+1＝α _i，k ^Iter，for?other?i，k

4) with α ^Iter+1Recomputate H _Ik, and select new upper (k) and lower (k);

5) if for different k ∈ 1,2 ..., K-1}, (i, i ') index value is to always satisfying H arbitrarily _Ik=H _{I ' k}, iteration stopping then, otherwise change step 2 over to) continue, till stop condition satisfies.

Further again, in described step D, described speaker identification's mode is:

\underset{k &Element; {1,2, \cdot \cdot \cdot K}}{\arg \max} (p (c_{i} = k | \overset{&OverBar;}{x}; β))

For new phonetic entry vector x, get the highest k of posterior probability the artificial recognition result of speaking, wherein:

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})},

k＝1，2，…K。

Further, in described steps A, described pre-service comprises: sample quantization, center clipping, high boost and windowing divide frame.

Technical conceive of the present invention is: it is the effective property distinguished disaggregated model that nuclear logistic returns, and is mainly used in the posterior probability that generates in the discriminant classification, has been successfully applied to occasions such as the selection of gene pathology, the classification of credit card risk, isolated word identification.Nuclear logistic returns has natural posterior probability output, and the extendability good to multivariate classification, makes it can be applied to these many discriminant classifications occasions of speaker identification very naturally.Existing nuclear logistic returns and is applied to speaker identification's technology, only binary model is carried out simple application, though discrimination is better than classic algorithm, and the model construction complexity, and do not embody the advantage that logistic returns the diversification expanded application.

Multivariate core logistic returns speaker identification's method, be to make up a multivariate core logistic regression model for several speakers, when new unknown speech characteristic parameter was imported, this model can be exported each speaker's posterior probability, and wherein that people that probability is the highest is for distinguishing the result.At first traditional logistic regression model is carried out the diversification expansion, utilize the nuclear skill again, former linear model is converted into nonlinear model to adapt to speaker identification's occasion.In the model training stage, each speaker's training utterance data to be carried out pre-service and extract the input feature value of characteristic parameter as model, model parameter is upgraded by minmal sequence rapid optimizing algorithm iteration.At cognitive phase, speaker's statement to be identified is carried out identical pre-service, extract identical characteristic parameter, export each speaker's posterior probability by the multivariate core logistic regression model of training gained, to obtain to distinguish the result.

The technical solution adopted for the present invention to solve the technical problems can also be further perfect.Described multivariate core logistic regression model training algorithm is minmal sequence optimization, cost functional with former multivariate core logistic regression model is converted to dual form earlier, and derive optimal condition, in each iterative process, only two parameters are upgraded optimization, matrix inversion operation when avoiding a plurality of parameters to upgrade simultaneously makes model training speed faster.

The effect that the present invention is useful is: 1, adopt the multivariate core logistic regression model as speaker identification's model, discrimination is higher than tradition generation property model (as gauss hybrid models), to compare discrimination similar with other property distinguished models (as support vector machine), but support vector machine is the binary classification device, can only make up a plurality of models by " one-to-many " or " one to one " mode carries out the ballot formula and classifies more, and the multivariate core logistic regression model can directly carry out many classification, and model construction is directly perceived more quick; 2, multivariate core logistic regression model training process adopts the minmal sequence optimized Algorithm, makes training speed faster, meets the big training sample situation of this class of speaker identification.

Embodiment

Below the present invention is further described.

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})},

k＝1，2，…K

Wherein, K is for need distinguish speaker's number, and x is 12 dimensional feature vectors, β={ β ₁, β ₂..., β _K} ^TBe the overall model parameter; β _kBe k the corresponding model parameter of speaker, β _K0Be k the corresponding model constant parameter of speaker, c _iBe i the corresponding speaker's target of speech feature vector.

The concrete framework of present embodiment is:

First's feature extraction

Prior art is adopted in feature extraction, and the voice signal of at first gathering each speaker's different times is some, carries out pretreatment operation, comprises that sample quantization, center clipping, pre-emphasis, low segment are removed, windowing divides frame.Pretreated voice signal is carried out feature extraction, the present invention adopts Mel frequency cepstrum parameter (Mel FrequencyCepstral Coefficient, MFCC), extract 13 rank Mel cepstrum parameters of every frame voice signal, remove and wherein speaker characteristic is described the 0th less rank parameter, last every frame voice signal is converted to 12 Jan Vermeer cepstrum feature vectors.

The second portion multivariate core logistic returns speaker identification's model

During the speaker identification used, setting and needing the number of speaking of identification was K, and after first's characteristic parameter extraction, given training sample set is { x ₁, c ₁, { x ₂, c ₂... { x _n, c _n, wherein import x _iBe p dimension speaker characteristic vector, i.e. x _i=(x _I1, x _I2... x _Ip) ^T, p=12 among the present invention, output sample label c _iBe finite set 1,2 ... one of K}, setting β is a model parameter.

Classical logistic regression model is the binary classification device, i.e. K=2, and the definition linear discriminant function is:

g ({\overset{&OverBar;}{x}}_{i}) = \log it {p ({\overset{&OverBar;}{x}}_{i}, β)} = \log \frac{p (c_{i} = 1)}{p (c_{i} = 0)} = β^{T} {\overset{&OverBar;}{x}}_{i} + β_{0}

Then can get the posterior probability that sample is under the jurisdiction of classification 1:

p (c_{i} = 1 | \overset{&OverBar;}{x}) = \frac{\exp (g (\overset{&OverBar;}{x}))}{1 + \exp (g (\overset{&OverBar;}{x}))}

The Logistic regression problem is exactly linear function g (x _i)=β ^Tx _iThe parameter beta optimization problem.Suppose sample object label c _i∈ 1,0} obeys Bernoulli according to input sample set X and distributes, then the sample likelihood score:

l (β) = Π_{i = 1}^{n} {(p {(c_{i} = 1 | {\overset{&OverBar;}{x}}_{i})}^{c_{ij}} (1 - p (c_{i} = 1 | {\overset{&OverBar;}{x}}_{i})))}^{1 - c_{ij}}

It is got negative logarithm, obtains the original cost functional that minimizes of linear logistic regression model:

L (β) = - Σ_{i = 1}^{n} [c_{i} (β^{T} {\overset{&OverBar;}{x}}_{i} + β_{0}) - \log (1 + \exp (β^{T} {\overset{&OverBar;}{x}}_{i} + β_{0}))]

Linear indivisibility at speaker identification's input characteristic parameter must expand to nonlinear regression model (NLRM) with linear regression model (LRM), returns thereby obtain examining logistic.By Nonlinear Mapping Φ: R ^p→ F is mapped to high-dimensional feature space with the former input space.In the F space, β can be expressed as:

β = Σ_{i = 1}^{n} α_{i} Φ ({\overset{&OverBar;}{x}}_{i})

In higher dimensional space, make up g ' (x _i)=β ^TΦ (x _i)+β ₀, that is:

g^{'} (\overset{&OverBar;}{x}) = β^{T} Φ (\overset{&OverBar;}{x}) + β_{0} = (Σ_{i = 1}^{n} α_{i} Φ ({\overset{&OverBar;}{x}}_{i})) Φ (\overset{&OverBar;}{x}) + β_{0} = Σ_{i = 1}^{n} α_{i} K ({\overset{&OverBar;}{x}}_{i}, \overset{&OverBar;}{x}) + β_{0}

Like this for the former input space, g ' (x) just becomes a nonlinear function, wherein K (x, y) for satisfying the kernel function of Mercer condition, the radially basic kernel function that the present invention is most widely used:

K (x, y) = \exp (- \frac{{| | x - y | |}^{2}}{σ})

Utilize the nuclear skill, the posterior probability of coring

p (c_{i} = 1 | \overset{&OverBar;}{x}) = \frac{1}{1 + \exp (- Σ_{i = 1}^{n} α_{i} K ({\overset{&OverBar;}{x}}_{i}, \overset{&OverBar;}{x}) - β_{0})},

And the cost functional that nuclear logistic returns is:

\min L (α) = - Σ_{i = 1}^{n} Σ_{j = 1}^{n} [c_{i} α_{j} K ({\overset{&OverBar;}{x}}_{i}, {\overset{&OverBar;}{x}}_{j}) + β_{0}] + Σ_{i = 1}^{n} \log (1 + \exp [α_{j} K ({\overset{&OverBar;}{x}}_{i}, {\overset{&OverBar;}{x}}_{j}) + β_{0}]

Knowledge is debated if directly use binary nuclear logistic model in general K＞2 in the actual speaker identification system, then needs to adopt " one-to-many " or " one to one " method to make up a plurality of sorters, has increased the loaded down with trivial details degree of model construction.True coker logistic returns and can carry out the multivariate classification expansion very naturally:

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})},

k＝1，2，…K

β=[β wherein ₁ ^Tβ ₂ ^Tβ _K ^T], β ∈ R ^{P * K}Be the most optimized parameter, i.e. the model parameter that need optimize of speaker identification system.The derivation of optimized parameter is still by minimizing negative log-likelihood degree function:

\min_{β} l (β) = - \log (Π_{i = 1}^{n} p (c_{i} = k | {\overset{&OverBar;}{x}}_{i}; β)

= Σ_{k = 1}^{K} \underset{c_{i} = k}{Σ} [- {β_{k}}^{T} {\overset{&OverBar;}{x}}_{i} - β_{k 0} + \log (Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})]

For strengthening the generalization ability of model, to optimum functional stack L ₂The regularization part.Final polynary logistic regression model cost functional:

\min H = Σ_{i = 1}^{n} [- {\overset{&OverBar;}{c}}_{i}^{T} {β_{k}}^{T} x_{i} - β_{k 0} + \log (Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0}))] + \frac{λ}{2} Σ_{k = 1}^{K} {| | β_{k} | |}^{2}

C wherein _iBe the K dimensional vector, work as c _iDuring=k, c _i=(0,0 ... 1 ... 0), 1 is the k dimension.Equally, use the nuclear skill, corresponding multivariate core logistic regression model cost functional:

\min H^{'} = Σ_{i = 1}^{n} Σ_{j = 1}^{n} [- {\overset{&OverBar;}{c}}_{i}^{T} α_{jk} K ({\overset{&OverBar;}{x}}_{i}, {\overset{&OverBar;}{x}}_{j}) - β_{k 0}] + Σ_{j = 1}^{n} \log (Σ_{j = 1}^{K} α_{mj} K ({\overset{&OverBar;}{x}}_{m}, {\overset{&OverBar;}{x}}_{i}) + β_{j 0})

+ \frac{λ}{2} Σ_{k = 1}^{K} \underset{i, i^{'}}{Σ} α_{ik} α_{i^{'} k} K ({\overset{&OverBar;}{x}}_{i}, {\overset{&OverBar;}{x}}_{j^{'}})

Third part model training algorithm

The training algorithm of nuclear logistic regression model is a lot, the heavy weighted least squares (IRRLS) of iteration, newton-La Feixunfa, trust region Newton method (TRNM) etc. are arranged, in the occasion such as speaker identification of all these methods and many class objects numerous in training sample quantity, each iterative process has the matrix inversion operation of suitable calculated amount.The present invention carries out dualization with protokaryon logistic regression model cost functional, proposes minmal sequence and optimizes training algorithm, only optimizes two side reaction coefficients in each iterative process, has avoided time-consuming matrix manipulation.

1) cost functional dualization

Former cost functional formula is equal to:

\min H^{'} = C Σ_{i = 1}^{n} Σ_{k = 1}^{K} g (ξ_{ik}) + \frac{1}{2} Σ_{k = 1}^{K} {| | {\overset{&OverBar;}{β}}_{k} | |}^{2},

Wherein C=1/ λ is the regularization constant, and

ξ_{ik} = β_{k 0} + {\overset{&OverBar;}{β}}_{k}^{T} {\overset{&OverBar;}{x}}_{i}, g (ξ_{ik}) = - c_{ik} ξ_{ik} + \log (e^{ξ_{i 1}} + e^{ξ_{i 2}} + \cdot \cdot \cdot + e^{ξ_{iK}}),

Be converted into Lagrangian Form:

+ Σ_{i = 1}^{n} Σ_{k = 1}^{K} α_{ik} (ξ_{ik} - β_{k 0} - {\overset{&OverBar;}{β}}_{k}^{T} {\overset{&OverBar;}{x}}_{i}) + α_{0} Σ_{k = 1}^{K} β_{k 0}

α wherein _Ik, α ₀Be Lagrange multiplier, then the KKT condition is:

Can derive by top three formulas

{\overset{&OverBar;}{β}}_{k} = Σ_{i = 1}^{n} α_{ik} {\overset{&OverBar;}{x}}_{i}, &ForAll; k,

Σ_{k = 1}^{K} α_{ik} = 0,

α_{0} = 0, Σ_{i = 1}^{n} α_{ik} = 0, &ForAll; k,

And

ξ_{ik} = \log (c_{ik} - \frac{α_{ik}}{C}) - \frac{1}{K} Σ_{k^{'} = 1}^{K} \log (c_{{ik}^{'}} - \frac{α_{{ik}^{'}}}{C}), &ForAll; i, k

g^{'} (ξ_{ik}) = - \frac{α_{ik}}{C}

If G (δ)=δ ξ _Ik-g (ξ _Ik), wherein

δ = - \frac{α_{ik}}{C},

G is carried out differential can be got:

\frac{&PartialD; G}{&PartialD; δ} = ξ_{ik} + δ \frac{d ξ_{ik}}{dδ} - g^{'} (ξ_{ik}) \frac{d ξ_{ik}}{dδ} = ξ_{ik}

So G can be by integration by ξ _IkGained:

G (- \frac{α_{ik}}{C}) = \frac{K - 1}{K} (c_{ik} - \frac{α_{ik}}{C}) \log (c_{ik} - \frac{α_{ik}}{C}) + const .

Wherein const is a constant, and G is the part of cost functional, uses the Wolfe duality theory, through simplifying the dual form that can get cost functional is:

\min D = C Σ_{i = 1}^{n} Σ_{k = 1}^{K} G (- \frac{α_{ik}}{C}) + \frac{1}{2} Σ_{k = 1}^{K} {| | {\overset{&OverBar;}{β}}_{k} | |}^{2}

s . t . Σ_{k = 1}^{K} α_{ik} = 0, &ForAll; i; Σ_{i = 1}^{n} α_{ik} = 0, &ForAll; k

2) optimal condition

Following formula contains two constraint conditions, use minmal sequence optimization and carry out minimized target, and is at first that one of them constraint condition is integrated in cost functional:

\min \tilde{D} = C Σ_{i = 1}^{n} Σ_{k = 1}^{K} G (- \frac{α_{ik}}{C}) + \frac{1}{2} Σ_{k = 1}^{K - 1} {| | {\overset{&OverBar;}{β}}_{k} | |}^{2} + \frac{1}{2} {| | - Σ_{i = 1}^{n} (Σ_{k^{'} = 1}^{K - 1} α_{ik}) {\overset{&OverBar;}{x}}_{i} | |}^{2}

s . t . Σ_{i = 1}^{n} α_{ik} = 0, &ForAll; k

Then the Lagrangian Form of antithesis cost functional is:

The optimal condition of minmal sequence optimization promptly is α in the following formula dual function _IkThe change stop condition of parameter will

To α _IkCarrying out differential obtains:

α wherein _IkSatisfy condition:

\{\begin{matrix} 0 < α_{ik} < C, c_{ik} = 1 \\ - C < α_{ik} < 0, c_{ik} = 0 \end{matrix}

And

0 < Σ_{i = 1}^{n} (c_{ik} - \frac{α_{ik}}{C}) < 1

If:

H_{ik} = Σ_{t = 1}^{n} α_{tk} K ({\overset{&OverBar;}{x}}_{t}, {\overset{&OverBar;}{x}}_{i}) + Σ_{t = 1}^{n} (Σ_{k^{'} = 1}^{K - 1} α_{{tk}^{'}} K ({\overset{&OverBar;}{x}}_{t}, {\overset{&OverBar;}{x}}_{i}))

- [\log (c_{ik} - \frac{α_{ik}}{C}) + \log (1 - Σ_{k = 1}^{K - 1} (c_{ik} - \frac{α_{ik}}{C}))]

upper (k) = \underset{i}{\arg \max} H_{ik},

k＝1，2，…，K-1

lower (k) = \underset{i}{\arg \min} H_{ik},

k＝1，2，…，K-1

The optimal condition of multivariate core logistic regression model antithesis training then:

H _upper(k)，k＝H _lower(k)，k＝β _k，k＝1，2，…，K-1

3) the minmal sequence training is optimized

Based on the antithesis cost functional and the optimal condition thereof of above derivation, the basic operation that multivariate core logistic model minmal sequence is optimized training method comprises α in the correct initialization of alpha parameter and the each iteration _{Upper (k)}With α _{Lower (k)}Renewal, the specific algorithm flow process is as follows:

(1) the given initialization α vector that satisfies condition, iteration Iter=1;

(2), make H if exist different index to (i, i ') _{I, k}≠ H _{I ', k}, then select corresponding upper (k) and lower (k);

(3) upgrade α _{Upper (k), k}, α _{Lower (k), k}As follows

α _upper(k)，k ^Iter+1＝α _upper(k)，k ^Iter+t ^*

α _lower(k)，k ^Iter+1＝α _lower(k)，k ^Iter-t ^*

α _i，k ^Iter+1＝α _i，k ^Iter，for?other?i，k

(4) with α ^Iter+1Recomputate H _Ik, and select new upper (k) and lower (k);

(5) if for different k ∈ 1,2 ..., K-1}, (i, i ') index value is to always satisfying H arbitrarily _Ik=H _{I ' k}, then iteration stopping continues otherwise change step (2) over to, till the stop condition formula satisfies.

The 4th part speaker debates not

After multivariate core logistic recurrence speaker model made up and finishes, for new input vector x, classification results was:

\underset{k &Element; {1,2, \cdot \cdot \cdot K}}{\arg \max} (p (c_{i} = k | \overset{&OverBar;}{x}; β))

That is, get the highest k of posterior probability the artificial recognition result of speaking, wherein:

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})},

k＝1，2，…K

Measure of merit: the corpus of oneself recording is adopted in experiment, 20 of recording total numbers of persons, and wherein the man is 12,8 of woman.Data transform acquisition by sample frequency 8000Hz, quantization digit 16bit, monophony A/D.Everyone voice signal is recorded synthetic by different times.Everyone mix extract different times sound bite total length 15s as training signal, the sound bite that 20 length of different times are 1.5s is as test signal, i.e. 20 training utterances, 400 tested speech.Voice signal is earlier through high boost, pre-service such as center reduction, detect by VAD (Voice Activity Detection) sound is active again, extract wherein effective voice segments, remove redundant unvoiced segments, is the MFCC characteristic parameters that length divides frame to extract 12 dimensions with 30ms, i.e. mel frequency cepstral coefficient is as sorting parameter.

Multivariate core logistic homing method and gauss hybrid models method and support vector machine method are carried out the contrast of speaker identification's discrimination, wherein the degree of mixing of gauss hybrid models gets 100, support vector machine is the binary classification device, adopts the method that " one to one " a plurality of disaggregated models carry out the identification of ballot formula that makes up.The multivariate core logistic homing method is taked identical radially basic kernel function, σ value 1.5 with support vector machine method.Discrimination is as a result: multivariate core logistic returns: 97.5%; Support vector machine: 97%; Gauss hybrid models: 96.5%.As seen, the excellent and classical speaker identification's method of speaker identification's discrimination of the inventive method.

Claims

1, a kind of speaker identification's implementation method based on the multivariate core logistic regression model, it is characterized in that: described speaker identification's implementation method may further comprise the steps:

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})},

k＝1，2，…K

Wherein, K is for need distinguish speaker's number, and x is 12 dimension input feature values, β={ β ₁, β ₂..., β _K} ^TBe the overall model parameter; β _kBe k the corresponding model parameter of speaker, β _K0Be k the corresponding model constant parameter of speaker, c _iBe i the corresponding speaker's target of speech feature vector;

2, the speaker identification's implementation method based on the multivariate core logistic regression model as claimed in claim 1, it is characterized in that: in step C, the cost functional of described minmal sequence optimized Algorithm is:

\min D = C Σ_{i = 1}^{n} Σ_{k = 1}^{K} (c_{ik} - \frac{α_{ik}}{C}) \log (c_{ik} - \frac{α_{ik}}{C}) + \frac{1}{2} Σ_{k = 1}^{K} {| | β_{k} | |}^{2}

s . t . Σ_{k = 1}^{K} α_{ik} = 0, &ForAll; i;

Σ_{i = 1}^{n} α_{ik} = 0, &ForAll; k

Wherein, β _kBe k the corresponding model parameter of speaker, C is the constant penalty factor, α _IkBe the coefficient vector that β launches at higher dimensional space, c _Ik{ 1,0} is K dimensional vector c to ∈ _iIn respective index value,

{\overset{&OverBar;}{c}}_{i} = {0,0, . . ., \underset{k}{1}, . . ., \underset{K}{0}},

1 is the k dimension, represents k speaker's object vector;

Training step is as follows:

2), make H if exist different index to (i, i ') _{I, k}≠ H _{I ', k}, then select corresponding

upper (k) = \underset{i}{\arg \max} H_{ik},

K=1,2 ..., K-1 and

lower (k) = \underset{i}{\arg \min} H_{ik},

k＝1，2，…，K-1；

Wherein:

H_{ik} = Σ_{t = 1}^{n} α_{tk} K ({\overset{&OverBar;}{x}}_{t}, {\overset{&OverBar;}{x}}_{i}) + Σ_{t = 1}^{n} (Σ_{k^{'} = 1}^{K - 1} α_{{tk}^{'}} K ({\overset{&OverBar;}{x}}_{t}, {\overset{&OverBar;}{x}}_{i}))

- [\log (c_{ik} - \frac{α_{ik}}{C}) + \log (1 - Σ_{k = 1}^{K - 1} (c_{ik} - \frac{α_{ik}}{C}))]

(x is y) for satisfying the kernel function of Merser condition for K.

α _upper(k)，k ^Iter+1＝α _upper(k)，k ^Iter+t ^*

α _lower(k)，k ^Iter+1＝α _lower(k)，k ^Iter-t ^*

α _i，k ^Iter+1＝α _i，k ^Iter，for?other?i，k

4) with α ^Iter+1Recomputate H _Ik, and select new upper (k) and lower (k);

3, the speaker identification's implementation method based on the multivariate core logistic regression model as claimed in claim 1 or 2 is characterized in that: in described step D, described speaker identification's mode is:

\underset{k &Element; {1,2, . . . K}}{\arg \max} (p (c_{i} = k | \overset{&OverBar;}{x}; β))

p (c_{i} = k | \overset{&OverBar;}{x}; β) = \frac{\exp ({β_{k}}^{T} \overset{&OverBar;}{x} + β_{k 0})}{Σ_{j = 1}^{K} \exp ({β_{j}}^{T} \overset{&OverBar;}{x} + β_{j 0})}, k = 1,2, . . . K .

4, the speaker identification's implementation method based on the multivariate core logistic regression model as claimed in claim 3, it is characterized in that: in described steps A, described pre-service comprises: sample quantization, center clipping, high boost and windowing divide frame.