CN101650944A

CN101650944A - Method for distinguishing speakers based on protective kernel Fisher distinguishing method

Info

Publication number: CN101650944A
Application number: CN200910152590A
Authority: CN
Inventors: 王万良; 郑建炜; 王震宇; 韩姗姗; 蒋一波; 郑泽萍; 王磊; 陈胜勇
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2009-09-17
Filing date: 2009-09-17
Publication date: 2010-02-17

Abstract

The invention relates to a method for distinguishing speakers based on a protective kernel Fisher distinguishing method. The method comprises steps as follows: (1) pretreating voice signals; (2) extracting characteristic parameters: after framing and end point detection of voice signals, extracting Mel frequency cepstrum coefficients as characteristic vectors of speakers; (3) creating a speaker distinguishing model; (4) calculating model optimal projection vector: by using optimal solution of LWFD method, calculating to obtain an optimal projection vector group; (5) distinguishing speakers: projecting original data xi to yi belonging to R<r>( r is more than or equal to 1 and less than or equal to d) according to optimal projection classification vector phi, wherein r is cut dimensionality;the optimal projection classification dimensionality of original c type data space is c-1, then solving a central value of data of each type after injection and normalizing; after projecting data tobe classified to a sub space and normalizing, calculating Euclidean distance from the normalized protecting data to the central point of each type of data in the sub space, and judging the nearest tobe a distinguishing result. The invention has high distinguishing rate, simple model construction and favorable rapidity.

Description

Based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method

Technical field

The present invention relates to signal Processing, machine learning and area of pattern recognition, especially a kind of speaker identification's implementation method.

Background technology

(Speaker Recognition SR) claims words person identification again to Speaker Identification, is meant by the analyzing and processing of speaker's voice signal also being confirmed automatically speaker's technology.The speaker identification who the present invention relates to is an important branch of Speaker Identification.It is from that the speaker identification system must recognize voice to be identified for which of individual philtrum to be investigated, also will make the differentiation of refusal to the voice beyond this people sometimes.It is not the process of a pattern match that the speaker debates, and in this process, computing machine at first will be set up speech model according to speaker's characteristic voice; Promptly the voice signal of input is analyzed, and extracted speaker's personal characteristics, set up the required model of speaker identification on this basis.Speaker debates other system can be divided into several sections such as the training of the selection of pre-service, characteristic parameter of voice and extraction, model of cognition and coupling.

The algorithm of comparative maturity mainly contains vector quantization (Vector Quantization at present, VQ), support vector machine (Support Vector Machine, SVM), Hidden Markov Model (HMM) (Hidden Markov Model, HMM), mixed Gauss model (Gaussian Mixture Model, GMM) etc.Wherein the VQ method is only at the relevant speaker identification's occasion of text.The prerequisite that GMM and HMM method are used is to need a large amount of training utterance data to carry out the optimization of model parameter.Though SVM can obtain recognition efficiency preferably, its range of application that has been the weak intrinsic limitation of output of non-probability and multiclass expansion.

Through new patent searching statistics, the patent of existing many Speaker Identification aspect both at home and abroad; For example, based on the method for distinguishing speek person (200510061953.X) of the supporting vector machine model of embedded GMM nuclear, utilize the method for distinguishing speek person (200710157134.4) of base frequency envelope to eliminate emotion voice, based on the method for distinguishing speek person (200710157133.X) of neutrality and affection sound-groove model conversion, based on the method for distinguishing speek person (200510061954.4) of hybrid supporting vector machine, based on the emotional speaker recognition method (200810162450.5) of frequency spectrum translation, based on the method for distinguishing speek person (200810162449.2) of mixed t model; Based on method for distinguishing speek person (200510061360.3) of MFCC linear emotion compensation etc.

Summary of the invention

Lower for the discrimination that overcomes existing speaker identification's implementation method, model construction is complicated, slow-footed deficiency, the invention provides a kind of discrimination height, model construction simple, have good rapidity based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method, may further comprise the steps:

1., the pre-service of voice signal: voice signal is carried out pre-service;

2., characteristic parameter extraction: after voice signal is finished processing of branch frame and end-point detection, extract the Mel cepstrum parameter as the speaker characteristic vector, described Mel cepstrum parameter is 13 rank cepstrum parameters, remove and wherein speaker characteristic is described the 0th less rank parameter, every frame voice signal is converted to 12 Jan Vermeer cepstrum feature vectors;

3., speaker identification's model construction:

Set x _i∈ R ^d(i=1,2 ..., N) be d dimension sample data, y _i∈ 1,2 ..., and c} is corresponding class label, and wherein N is a total sample number, and c is the classification sum, c _lBe the sample number of l class, then:

Σ_{l = 1}^{c} c_{l} = N

X is a sample matrix, that is:

X≡(x ₁|x ₂|…|x _n|)

Based on above-mentioned pacing items, set up speaker identification's model and be:

Wherein:

S_{b}^{Φ} = \frac{1}{N} Σ_{i = 1}^{c} c_{l} ({\overset{&OverBar;}{x}}_{i} - \overset{&OverBar;}{x}) {({\overset{&OverBar;}{x}}_{i} - \overset{&OverBar;}{x})}^{T}

Be divergence matrix in the class,

{\tilde{S}}_{w} &equiv; Σ_{i = 1}^{n} x_{i} {x_{i}}^{T} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} x_{k} {x_{m}}^{T}

Be divergence matrix in the class, affine matrix

A_{i, j} = \exp (- \frac{{| | x_{i} - x_{j} | |}^{2}}{σ^{2}}),

Wherein σ is the adjustable integer constant factor,

Be best projection class vector to be asked;

4., model best projection vector calculation

Adopt the optimum solution of LWFD method, promptly according to formula:

Calculate the best projection Vector Groups, suppose nullB with

Represent S respectively _bWith

Kernel, then the best of following formula differentiates that the subspace takes from

{nullB}^{&perp;} \cap null \tilde{W},

NullB wherein ^⊥Be the orthocomplement of nullB, at first with S _bProject to nullB ^⊥, obtain nullB ^⊥Behind the space, again divergence matrix projection between class scatter matrix and the class is advanced

Subspace, the vector in the subspace of gained are the optimum proper vector of differentiating;

5., speaker identification:

According to optimum projection class vector

With former data x _iBe projected as y _i∈ R ^r(1≤r≤d), wherein r is the dimension after cutting down, the projection formula that adopts transformation matrix T:

The optimal classification projected dimensions of former c class data space is c-1, ask for the central value and the standardization of data after each class projection afterwards, after will treating that grouped data projects to subspace and standardization, calculate its with the subspace in the Euclidean distance of each class data center's point, nearest person is judged to recognition result.

Further, described step 4. in, the described optimum process of asking for of differentiating proper vector is:

At first with S _bProject to nullB ^⊥, rewrite S _bExpression formula is:

S_{b} = Σ_{i = 1}^{c} (\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ})) {(\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}))}^{T} = Σ_{i}^{c} {\overset{&OverBar;}{φ}}^{'}_{i} {\overset{&OverBar;}{φ^{'}}}_{i}^{T} {Φ_{b} Φ}_{b}^{T}

Wherein

{\overset{&OverBar;}{φ}}^{'}_{i} = \sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}), Φ_{b} = [{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}] .

Matrix S _bOrder be c-1, Φ _bΦ _b ^TWith Φ _b ^TΦ _bIdentical nonzero eigenvalue is arranged, and the pairing proper subspace of filtering zero eigenvalue is S _bKernel; Use Φ _b ^TΦ _bSubstitute Φ _bΦ _b ^TAnd take to examine skill and derive;

{Φ_{b}}^{T} Φ_{b} = {[{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} [{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}] = {({\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\overset{&OverBar;}{φ}}^{'}_{j})}_{\underset{j = 1, \cdot \cdot \cdot, c}{i = 1, \cdot \cdot \cdot, c}}

Wherein:

{\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = \frac{\sqrt{c_{i} c_{j}}}{N} ({φ_{i}}^{T} φ_{j} - {φ_{i}}^{T} φ - φ^{T} φ_{j} + φ^{T} φ)

With each utilizes kernel function to be converted to matrix in the following formula:

{Φ_{b}}^{T} Φ_{b} = \frac{1}{N} B \times (A_{LC}^{T} \times K \times A_{LC} - \frac{1}{N} (A_{LC}^{T} \times K \times 1_{LC}) -

\frac{1}{N} (1_{LC}^{T} \times K \times A_{LC}) + \frac{1}{N^{2}} (1_{LC}^{T} \times K \times 1_{LC})) \times B

Wherein

B = diag [\sqrt{c_{1}} \cdot \cdot \cdot \sqrt{c_{c}}],

1 _LCBe that an all elements is L * C matrix of 1,

A_{LC} = diag [a_{c_{1}} \cdot \cdot \cdot a_{c_{c}}]

Be a L * C block diagonal matrix, piece

Be that an all elements is

C _i* 1 column vector;

If λ _iWith e _i(i=1 ... c) be Φ _b ^TΦ _bI eigen vector, and with the eigenwert descending sort; V then _i=Φ _be _iIt is former between class scatter matrix S _bProper vector; Remove S _bKernel, promptly abandon eigenwert and be zero individual features vector, keep v _iIn before c-1 proper vector: V=[v ₁V _C-1]=Φ _bE _m=Φ _b[e ₁E _C-1], V then ^TS _bV=Λ _b, Λ _b=diag[λ ₁λ _C-1] be (c-1) * (c-1) diagonal matrix;

Obtain nullB ^⊥Behind the space, divergence matrix between class scatter matrix and the class is complied with

U = V Λ_{b}^{- 1 / 2}

The subspace is entered in projection, wherein U ^TS _bU=I,

U^{T} {\tilde{S}}_{w} U = {(E_{m} Λ_{b}^{- 1 / 2})}^{T} (Φ_{b}^{T} {\tilde{S}}_{w} Φ_{b}) (E_{m} Λ_{b}^{- 1 / 2}),

Utilize nuclear matrix K, will

Carry out the coring conversion:

{Φ_{b}}^{T} {\tilde{S}}_{w} Φ_{b} = {[{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} {\tilde{S}}_{w} [{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}] = {({\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\tilde{S}}_{w} {\overset{&OverBar;}{φ}}^{'}_{j})}_{\underset{j = 1, \cdot \cdot \cdot, c}{i = 1, \cdot \cdot \cdot, c}}

Wherein:

{\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\tilde{S}}_{w} {\overset{&OverBar;}{φ}}^{'}_{j} = Σ_{i = 1}^{n} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} φ (x_{i}) φ {(x_{i})}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} x_{k} {x_{m}}^{T} {\overset{&OverBar;}{φ}}^{'}_{j}

First:

Σ_{i = 1}^{n} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} φ (x_{i}) φ {(x_{i})}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = J_{1} = \frac{1}{L} B (A_{LC}^{T} KK A_{LC} - \frac{1}{L} (A_{LC}^{T} KK 1_{LC}) -

\frac{1}{L} (1_{LC}^{T} KK A_{LC}) + \frac{1}{L^{2}} (1_{LC}^{T} KK 1_{LC})) B

Second:

Σ_{l = 1}^{c} \frac{1}{c_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} x_{k} {x_{m}}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = J_{2} = \frac{1}{L} B (A_{LC}^{T} KWK A_{LC} - \frac{1}{L} (A_{LC}^{T} KWK 1_{LC})

- \frac{1}{L} (1_{LC}^{T} KWK A_{LC}) + \frac{1}{L^{2}} (1_{LC}^{T} KWK 1_{LC})) B

W=diag[w in the following formula ₁W _c] be a N * N partitioned matrix, w _iBe that an element is

C _i* c _iMatrix, therefore

Φ_{b}^{T} {\tilde{S}}_{w} Φ_{b} = J_{1} - J_{2},

It also is a c * c matrix.Then Be the simple matrix of a dimension, calculate its proper vector p for (c-1) * (c-1) _iWith eigenvalue ' _iAnd with the ascending order arrangement, m vectorial eigentransformation matrix Q=UP=U[p of getting before extracting ₁P _M-1], wherein 1≤m≤c-1 can get

Q^{T} {\tilde{S}}_{w} Q = Λ_{w},

Λ _w=diag[λ ' ₁λ ' _m] be a m * m diagonal matrix;

The optimum that keeps Fisher to differentiate in the class is differentiated proper vector:

Γ = Q Λ_{w}^{- 1 / 2},

Feature constitutes a low n-dimensional subspace n in the H space after the conversion.

Further, described step 5. in, the speaker's phonetic entry pattern z to be classified arbitrarily projects to proper subspace according to Γ, is calculated as follows:

y = Γ^{T} φ (z) = {(E_{m} Λ_{b}^{- 1 / 2} P Λ_{w}^{- 1 / 2})}^{T} (Φ_{b}^{T} φ (z))

Wherein:

Φ_{b}^{T} φ (z) = {[{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} φ (z)

Because

{\overset{&OverBar;}{φ}}^{'}_{i} φ (z) = {(\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - φ))}^{T} φ (z) = \sqrt{\frac{c_{i}}{N}} (\frac{1}{c_{i}} Σ_{m = 1}^{c_{i}} φ_{im}^{T} φ (z) - \frac{1}{N} Σ_{p = 1}^{c} Σ_{q = 1}^{c_{q}} φ_{pq}^{T} φ (z)),

Can get:

Φ_{b}^{T} φ (z) = \frac{1}{\sqrt{N}} B (A_{LC}^{T} γ (φ (z)) - \frac{1}{N} 1_{LC}^{T} γ (φ (z)))

Wherein

γ (φ (z)) = {[φ_{11}^{T} φ (z) | φ_{12}^{T} φ (z) | \cdot \cdot \cdot | φ_{{cc}_{c}}^{T} φ (z)]}^{T}

Be a N * 1 nuclear vector, the proper vector value is:

y = \frac{1}{\sqrt{N}} {(E_{m} Λ_{b}^{- 1 / 2} P Λ_{w}^{- 1 / 2})}^{T} (B (A_{LC}^{T} - \frac{1}{N} 1_{LC}^{T})) γ (φ (z))

Calculate the Euclidean distance of each the class data center's point in y and the subspace, the person is judged to recognition result recently.

Further again, described step 1. in, described pre-service comprises sampling, removes noise, end-point detection, pre-emphasis, branch frame and windowing.

Technical conceive of the present invention is: (Fisher Discriminant Analysis is that the sample data that d ties up the input space is projected on the straight line FDA), makes on this straight line projection in zone calibration the best of sample in the Fisher discriminatory analysis.Speaker's pitch, tone color, volume presents the colourful form of expression at different times, and speech characteristic parameter often has non-linear, polymorphism, directly uses the Fisher discriminant analysis method and can't obtain desirable recognition result.

Nuclear Fisher diagnostic method (Kernel Fisher Discriminant Analysis, KFDA) be will the nuclear learning method the product that combines with Fisher judgement method of thought.The thinking of KFDA algorithm is: at first by a Nonlinear Mapping, will import data and hint obliquely in a higher-dimension nuclear space; Then, in this higher-dimension nuclear space, carry out linear Fisher judgment analysis again, thereby realize with respect to former space being non-linear judgment analysis.Though nuclear Fisher diagnostic method meets the speaker and debates other nonlinear feature, but nuclear Fisher diagnostic method is only considered the global area calibration maximal projection of grouped data, do not consider the interior polymorphic distribution characteristics of class of same speaker's speech vector, but also need an accelerated model training algorithm to support speaker identification's big data quantity situation.

The effect that the present invention is useful is: 1, affinity between the sample is incorporated divergence matrix in the class with the weights form, propose to keep the Fisher method of discrimination in the class, be applied to the speaker identification, discrimination is higher than tradition generation property model (as gauss hybrid models); To compare discrimination similar with other property distinguished models (as support vector machine), yet support vector machine is the binary classification device, can only make up a plurality of models by " one-to-many " or " one to one " mode carries out the ballot formula and classifies more, and the inventive method can directly be carried out many classification, and model construction is directly perceived more quick; 2, the optimum projection class vector of the coring Fisher discrimination model of class internal characteristic maintenance is searched in the subspace of non-kernel between class by kernel in class, makes the optimal vector computing velocity faster, meets the big training sample situation of this class of speaker identification.

Embodiment

Below the present invention is further described.

1., the pre-service of voice signal: voice signal is carried out pre-service;

3., speaker identification's model construction:

Σ_{l = 1}^{c} c_{l} = N

X is a sample matrix, that is:

X≡(x ₁|x ₂|…|x _n|)

Wherein:

S_{b}^{Φ} = \frac{1}{N} Σ_{i = 1}^{c} c_{l} ({\overset{&OverBar;}{x}}_{i} - \overset{&OverBar;}{x}) {({\overset{&OverBar;}{x}}_{i} - \overset{&OverBar;}{x})}^{T}

Be divergence matrix in the class,

{\tilde{S}}_{w} &equiv; Σ_{i = 1}^{n} x_{i} {x_{i}}^{T} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} x_{k} {x_{m}}^{T}

Be divergence matrix in the class, affine matrix

A_{i, j} = \exp (- \frac{{| | x_{i} - x_{j} | |}^{2}}{σ^{2}}),

Wherein σ is the adjustable integer constant factor, x _iBe the mean value of i class sample, x represents the mean value of all samples,

Be best projection class vector to be asked.

4., model best projection vector calculation

Adopt the optimum solution of LWFD method, promptly according to formula:

Calculate the best projection Vector Groups, suppose nullB with

Represent S respectively _bWith

{nullB}^{&perp;} \cap null \tilde{W},

5., speaker identification:

According to optimum projection class vector

The framework of present embodiment is as follows:

First's feature extraction

Prior art is adopted in feature extraction substantially, and the voice signal of at first gathering each speaker's different times is some, carries out pretreatment operation, comprises that sample quantization, center clipping, pre-emphasis, unvoiced segments are removed, windowing divides frame.Pretreated voice signal is carried out feature extraction, the present invention adopts Mel frequency cepstrum parameter (MFCC), extract 13 rank Mel cepstrum parameters of every frame voice signal, remove and wherein speaker characteristic is described the 0th less rank parameter, last every frame voice signal is converted to 12 Jan Vermeer cepstrum feature vectors.

Keep the Fisher discrimination model in the second portion class

Traditional core Fisher criterion is as follows:

Wherein

S_{b}^{Φ} = \frac{1}{N} Σ_{i = 1}^{c} c_{l} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}) {({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ})}^{T}

S_{w}^{Φ} = \frac{1}{N} Σ_{i = 1}^{c} Σ_{j = 1}^{c_{l}} (φ (x_{j}^{i}) - {\overset{&OverBar;}{φ}}_{i}) {(φ (x_{j}^{i}) - {\overset{&OverBar;}{φ}}_{i})}^{T}

Between class scatter matrix and the interior divergence matrix of class among the higher dimensional space H have been represented respectively; φ (x) is that input vector x is at higher dimensional space H respective projection, φ _iBe the mean value of i class sample at higher dimensional space, φ represents the mean value of all samples of higher dimensional space.According to reproducing kernel theory,

Can be expressed as following form:

Then former differentiation criterion is equivalent to expression formula:

J (α) = \frac{α^{T} K_{b} α}{α^{T} K_{w} α}

Divergence matrix K in the nuclear class _wWith nuclear between class scatter matrix K _bBe defined as follows:

K_{b} = \frac{1}{N} Σ_{l = 1}^{c} c_{l} (μ_{l} - μ_{0}) {(μ_{l} - μ_{0})}^{T}

K_{w} = \frac{1}{N} Σ_{l = 1}^{c} Σ_{j = 1}^{c_{l}} (η_{x_{j}^{l}} - μ_{l}) {(η_{x_{j}^{l}} - μ_{l})}^{T}

Wherein:

η _x＝(k(x ₁，x)，…，k(x _N，x)) ^T

μ_{l} = {(\frac{1}{c_{l}} Σ_{m = 1}^{c_{l}} k (x_{1}, x_{m}), \cdot \cdot \cdot, \frac{1}{c_{l}} Σ_{m = 1}^{c_{l}} k (x_{1}, x_{m}))}^{T}

μ_{0} = {(\frac{1}{N} Σ_{m = 1}^{N} k (x_{1}, x_{m}), \cdot \cdot \cdot, \frac{1}{N} Σ_{m = 1}^{N} k (x_{1}, x_{m}))}^{T}

Try to achieve best nuclear discriminant vector α in the following formula according to Generalized Rayleigh Quotient ₁, α ₂... α _N, the projection matrix that best discriminant vector collection constitutes among the feature space H then:

The nuclear Fisher differentiation of vector x to be identified is characterized as

Have the situation of sub-clustering and overlapping sample in the class according to speaker's speech characteristic parameter, directly use nuclear Fisher method of discrimination and can't obtain desirable recognition result.The present invention proposes to protect the Fisher method of discrimination of class internal characteristic with the divergence matrix in class of local data's Feature Fusion in the class:

The between class scatter matrix S that keeps former Fisher techniques of discriminant analysis _bConstant, the divergence matrix is adjusted as follows in the class:

S_{w} = Σ_{l = 1}^{c} Σ_{i = 1}^{n_{l}} (x_{i} - \frac{1}{n_{l}} Σ_{j = 1}^{n_{l}} x_{j}) {(x_{i} - \frac{1}{n_{l}} Σ_{j = 1}^{n_{l}} x_{j})}^{T}

= Σ_{i = 1}^{n} x_{i} {x_{i}}^{T} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} x_{k} {x_{m}}^{T}

Based on above-mentioned adjustment, data local feature in the class incorporated make it to become:

{\tilde{S}}_{w} &equiv; Σ_{i = 1}^{n} x_{i} {x_{i}}^{T} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} x_{k} {x_{m}}^{T}

Wherein

A_{i, j} = \exp (- \frac{{| | x_{i} - x_{j} | |}^{2}}{σ^{2}}),

σ is adjustable integral factor.That is to say, the distance factor of homogeneous data is incorporated divergence matrix in the class with the form of weighting, the sample in similar to being weighted, is reduced similar sample medium and long distance data to the effect to divergence matrix in the class, the influence power of outstanding neighbour's data promptly keeps the class internal characteristic.Will

Be applied to the Fisher discriminatory analysis, then the Fisher criterion formulas becomes:

The best Fisher projection vector of third part obtains

The directly optimum projection that keeps Fisher to differentiate in the compute classes, similar Fisher diagnostic method need be asked K _w ^-1K _bThe corresponding proper vector of matrix eigenvalue of maximum, when being applied to the speaker identification, the sample characteristics vector is thousands of easily, and calculated amount is quite surprising, can't use in real time, must improve training algorithm.

The optimum solution that keeps the Fisher method of discrimination in the class promptly calculate according to the Fisher criterion formulas and the best projection Vector Groups, suppose nullB and

Represent S respectively _bWith

Kernel, take from the then best subspace of differentiating

{nullB}^{&perp;} \cap null \tilde{W},

NullB wherein ^⊥Be the orthocomplement of nullB, at first with S _bProject to nullB ^⊥, rewrite S _bExpression formula is:

S_{b} = Σ_{i = 1}^{c} (\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ})) {(\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}))}^{T} = Σ_{i}^{c} {\overset{&OverBar;}{φ}}^{'}_{i} {\overset{&OverBar;}{φ^{'}}}_{i}^{T} = Φ_{b} {Φ_{b}}^{T}

Wherein

{\overset{&OverBar;}{φ}}^{'}_{i} = \sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}), Φ_{b} = [{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}] .

Matrix S _bOrder be c-1, Φ _bΦ _b ^TWith Φ _b ^TΦ _bIdentical nonzero eigenvalue is arranged, and the pairing proper subspace of zero eigenvalue is S _bKernel, need filtering.Therefore use Φ _b ^TΦ _bSubstitute Φ _bΦ _b ^TAnd take to examine skill and derive, can simplify operand.

{Φ_{b}}^{T} Φ_{b} = {[{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} [{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}] = {({\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\overset{&OverBar;}{φ}}^{'}_{j})}_{\underset{j = 1, \cdot \cdot \cdot, c}{i = 1, \cdot \cdot \cdot, c}}

Wherein:

{\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = \frac{\sqrt{c_{i} c_{j}}}{N} ({φ_{i}}^{T} φ_{j} - {φ_{i}}^{T} φ - φ^{T} φ_{j} + φ^{T} φ)

{Φ_{b}}^{T} Φ_{b} = \frac{1}{N} B \times (A_{LC}^{T} \times K \times A_{LC} - \frac{1}{N} (A_{LC}^{T} \times K \times 1_{LC}) -

\frac{1}{N} (1_{LC}^{T} \times K \times A_{LC}) + \frac{1}{N^{2}} (1_{LC}^{T} \times K \times 1_{LC})) \times B

Wherein

B = diag [\sqrt{c_{1}} \cdot \cdot \cdot \sqrt{c_{c}}],

1 _LCBe that an all elements is L * C matrix of 1,

A_{LC} = diag [a_{c_{1}} \cdot \cdot \cdot a_{c_{c}}]

Be a L * C block diagonal matrix, piece Be that an all elements is

C _i* 1 column vector, K is the nuclear matrix of input feature value.。

If λ _iWith e _i(i=1 ... c) be Φ _b ^TΦ _bI eigen vector, and with the eigenwert descending sort.V then _i=Φ _be _iIt is former between class scatter matrix S _bProper vector.For obtaining optimum projection, must remove S _bKernel, promptly abandon eigenwert and be zero individual features vector, keep v _iIn before c-1 proper vector: V=[v ₁V _C-1]=Φ _bE _m=Φ _b[e ₁E _C-1], V then ^TS _bV=Λ _b, Λ _b=diag[λ ₁λ _C-1] be (c-1) * (c-1) diagonal matrix.

U = V Λ_{b}^{- 1 / 2}

The subspace is entered in projection, wherein U ^TS _bU=I,

U^{T} {\tilde{S}}_{w} U = {(E_{m} Λ_{b}^{- 1 / 2})}^{T} (Φ_{b}^{T} {\tilde{S}}_{w} Φ_{b}) (E_{m} Λ_{b}^{- 1 / 2}),

Utilize nuclear matrix K, will

Carry out the coring conversion:

{Φ_{b}}^{T} {\tilde{S}}_{w} Φ_{b} = {[{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} {\tilde{S}}_{w} [{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}] = {({\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\tilde{S}}_{w} {\overset{&OverBar;}{φ}}^{'}_{j})}_{\underset{j = 1, \cdot \cdot \cdot, c}{i = 1, \cdot \cdot \cdot, c}}

Wherein

{\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\tilde{S}}_{w} {\overset{&OverBar;}{φ}}^{'}_{j} = Σ_{i = 1}^{n} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} φ (x_{i}) φ {(x_{i})}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} x_{k} {x_{m}}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} .

Respectively with behind the matrix computations formal representation two be:

Σ_{i = 1}^{n} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} φ (x_{i}) φ {(x_{i})}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = J_{1} = \frac{1}{L} B (A_{LC}^{T} KK A_{LC} - \frac{1}{L} (A_{LC}^{T} KK 1_{LC}) -

\frac{1}{L} (1_{LC}^{T} KK A_{LC}) + \frac{1}{L^{2}} (1_{LC}^{T} KK 1_{LC})) B

Σ_{l = 1}^{c} \frac{1}{c_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} x_{k} {x_{m}}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = J_{2} = \frac{1}{L} B (A_{LC}^{T} KWK A_{LC} - \frac{1}{L} (A_{LC}^{T} KWK 1_{LC})

- \frac{1}{L} (1_{LC}^{T} KWK A_{LC}) + \frac{1}{L^{2}} (1_{LC}^{T} KWK 1_{LC})) B

C _i* c _iMatrix, therefore

Φ_{b}^{T} {\tilde{S}}_{w} Φ_{b} = J_{1} - J_{2},

It also is a c * c matrix.Then

Be the simple matrix of a dimension, calculate its proper vector p for (c-1) * (c-1) _iWith eigenvalue ' _iAnd with the ascending order arrangement, m vectorial feature extraction transformation matrix Q=UP=U[p that gets before getting ₁P _M-1], wherein 1≤m≤c-1 can get

Q^{T} {\tilde{S}}_{w} Q = Λ_{w},

Λ _w=diag[λ ' ₁λ ' _m] be a m * m diagonal matrix.

In sum, the optimum differentiation proper vector that keeps Fisher to differentiate in the class is:

Γ = Q Λ_{w}^{- 1 / 2},

Feature constitutes a low n-dimensional subspace n in the H space after the conversion, and has maximum separable degree.Ask for the central value of data and standardization after each class projection in the low n-dimensional subspace n, for next step speaker identification gets ready.

The 4th part speaker identification

Speaker's phonetic entry pattern z to be classified arbitrarily projects to proper subspace according to Γ, is calculated as follows:

y = Γ^{T} φ (z) = {(E_{m} Λ_{b}^{- 1 / 2} P Λ_{w}^{- 1 / 2})}^{T} (Φ_{b}^{T} φ (z))

Wherein:

Φ_{b}^{T} φ (z) = {[{\overset{&OverBar;}{φ}}^{'}_{1} \cdot \cdot \cdot {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} φ (z)

Because

{\overset{&OverBar;}{φ}}^{'}_{i} φ (z) = {(\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - φ))}^{T} φ (z) = \sqrt{\frac{c_{i}}{N}} (\frac{1}{c_{i}} Σ_{m = 1}^{c_{i}} φ_{im}^{T} φ (z) - \frac{1}{N} Σ_{p = 1}^{c} Σ_{q = 1}^{c_{q}} φ_{pq}^{T} φ (z)),

Can get:

Φ_{b}^{T} φ (z) = \frac{1}{\sqrt{N}} B (A_{LC}^{T} γ (φ (z)) - \frac{1}{N} 1_{LC}^{T} γ (φ (z)))

Wherein

γ (φ (z)) = {[φ_{11}^{T} φ (z) | φ_{12}^{T} φ (z) | \cdot \cdot \cdot | φ_{{cc}_{c}}^{T} φ (z)]}^{T}

It is a N * 1 nuclear vector.Final proper vector value is:

y = \frac{1}{\sqrt{N}} {(E_{m} Λ_{b}^{- 1 / 2} P Λ_{w}^{- 1 / 2})}^{T} (B (A_{LC}^{T} - \frac{1}{N} 1_{LC}^{T})) γ (φ (z))

Test experiments: the corpus of oneself recording is adopted in experiment, 20 of recording total numbers of persons, and wherein the man is 12,8 of woman.Data transform acquisition by sample frequency 8000Hz, quantization digit 16bit, monophony A/D.Everyone voice signal is recorded synthetic by different times.Everyone mix extract different times sound bite total length 15s as training signal, 20 length 1.5s sound bites of different times are as test signal, i.e. 20 training utterances, 400 tested speech.Voice signal is earlier through high boost, pre-service such as center reduction detect by VAD (Voice Activity Detection) sound is active again, extract wherein effectively voice segments, removing redundant unvoiced segments, is that length divides frame to extract the 12 MFCC characteristic parameters of tieing up as sorting parameter with 30ms.

GMM, SVM and the inventive method are carried out speaker identification's contrast test.The inventive method adopts identical radially basic kernel function with SVM.The erroneous results rate is respectively: gauss hybrid models: 3.5%; Support vector machine: 2.75%; The class internal characteristic keeps nuclear Fisher diagnostic method: 2.5%.As seen, the inventive method has the better recognition rate than classic method.

Claims

1, a kind of based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method, it is characterized in that: described speaker identification's implementation method may further comprise the steps:

1., the pre-service of voice signal: voice signal is carried out pre-service;

3., speaker identification's model construction:

Σ_{l = 1}^{c} c_{l} = N

X is a sample matrix, that is:

X≡(x ₁|x ₂|…|x _n|)

Wherein:

S_{b}^{Φ} = \frac{1}{N} Σ_{i = 1}^{c} c_{l} ({\overset{&OverBar;}{x}}_{i} - \overset{&OverBar;}{x}) {({\overset{&OverBar;}{x}}_{i} - \overset{&OverBar;}{x})}^{T}

Be the between class scatter matrix,

{\tilde{S}}_{w} &equiv; Σ_{i = 1}^{n} x_{i} {x_{i}}^{T} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} x_{k} {x_{m}}^{T}

Be divergence matrix in the class, affine matrix

A_{i, j} = \exp (- \frac{{| | x_{i} - x_{j} | |}^{2}}{σ^{2}}),

Be best projection class vector to be asked;

4., model best projection vector calculation

Adopt the optimum solution of LWFD method, promptly according to formula:

Calculate the best projection Vector Groups, suppose nullB with

Represent S respectively _bWith

5., speaker identification:

According to optimum projection class vector

2, as claimed in claim 1 based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method, it is characterized in that: described step 4. in, the described optimum process of asking for of differentiating proper vector is:

At first with S _bProject to nullB ^⊥, rewrite S _bExpression formula is:

S_{b} = Σ_{i = 1}^{c} (\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ})) {(\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}))}^{T} = Σ_{i}^{c} {\overset{&OverBar;}{φ}}^{'}_{i} {\overset{&OverBar;}{φ^{'}}}_{i}^{T} = Φ_{b} {Φ_{b}}^{T}

φ wherein _iBe the mean value of i class sample at higher dimensional space, φ represents the mean value of all samples of higher dimensional space,

{\overset{&OverBar;}{φ}}^{'}_{i} = \sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - \overset{&OverBar;}{φ}),

Φ _b＝[φ′ ₁…φ′ _c]。Matrix S _bOrder be c-1, Φ _bΦ _b ^TWith Φ _b ^TΦ _bIdentical nonzero eigenvalue is arranged, and the pairing proper subspace of zero eigenvalue is S _bKernel; Use Φ _b ^TΦ _bSubstitute Φ _bΦ _b ^TAnd take to examine skill and derive;

{Φ_{b}}^{T} Φ_{b} = {[{\overset{&OverBar;}{φ}}^{'}_{1} . . . {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} [{\overset{&OverBar;}{φ}}^{'}_{1} . . . {\overset{&OverBar;}{φ}}^{'}_{c}] = {({\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\overset{&OverBar;}{φ}}^{'}_{j})}_{\overset{i = 1, . . ., c}{j = 1, . . ., c}}

Wherein:

{\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = \frac{\sqrt{c_{i} c_{j}}}{N} ({φ_{i}}^{T} φ_{j} - {φ_{i}}^{T} φ - φ^{T} φ_{j} + φ^{T} φ)

{Φ_{b}}^{T} Φ_{b} = \frac{1}{N} B \times (A_{LC}^{T} \times K \times A_{LC} - \frac{1}{N} (A_{LC}^{T} \times K \times 1_{LC}) -

\frac{1}{N} (1_{LC}^{T} \times K \times A_{LC}) + \frac{1}{N^{2}} (1_{LC}^{T} \times K \times 1_{LC})) \times B

Wherein

B = diag [\sqrt{c_{1}} . . . \sqrt{c_{c}}],

1 _LCBe that an all elements is L * C matrix of 1,

A_{LC} = diag [a_{c_{1}} . . . a_{c_{c}}]

Be a L * C block diagonal matrix, piece

Be that an all elements is

C _i* 1 column vector, K is the nuclear matrix of input feature value.

If λ _iWith e _i(i=1 ... c) be Φ _b ^TΦ _bI eigen vector, and with the eigenwert descending sort; V then _i=Φ _be _iIt is former between class scatter matrix S _bProper vector; Remove S _bKernel, promptly abandon eigenwert and be zero individual features vector, keep v _iIn before c-1 proper vector: V=[v ₁V _C-1]=Φ _bE _m=Φ _b[e ₁E _C-1], V then ^TS _bV=Λ _b, Λ _b=diang[λ ₁λ _C-1] be (c-1) * (c-1) diagonal matrix.

U = V Λ_{b}^{- 1 / 2}

The subspace is entered in projection, wherein U ^TS _bU=I,

U^{T} {\tilde{S}}_{w} U = {(E_{m} Λ_{b}^{- 1 / 2})}^{T} (Φ_{b}^{T} {\tilde{S}}_{w} Φ_{b}) (E_{m} Λ_{b}^{- 1 / 2}),

Utilize nuclear matrix K, will

Carry out the coring conversion:

{Φ_{b}}^{T} {\tilde{S}}_{w} Φ_{b} = {[{\overset{&OverBar;}{φ}}^{'}_{1} . . . {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} {\tilde{S}}_{w} [{\overset{&OverBar;}{φ}}^{'}_{1} . . . {\overset{&OverBar;}{φ}}^{'}_{c}] = {({\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\tilde{S}}_{w} {\overset{&OverBar;}{φ}}^{'}_{j})}_{\overset{i = 1, . . ., c}{j = 1, . . ., c}}

Wherein:

{\overset{&OverBar;}{φ}}^{'}_{i}^{T} {\tilde{S}}_{w} {\overset{&OverBar;}{φ}}^{'}_{j} = Σ_{i = 1}^{n} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} φ (x_{i}) φ {(x_{i})}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} - Σ_{l = 1}^{c} \frac{1}{n_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} x_{k} {x_{m}}^{T} {\overset{&OverBar;}{φ}}^{'}_{j}

First:

Σ_{i = 1}^{n} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} φ (x_{i}) φ {(x_{i})}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = J_{1} = \frac{1}{L} B (A_{LC}^{T} {KKA}_{LC} - \frac{1}{L} (A_{LC}^{T} KK 1_{LC}) -

\frac{1}{L} (1_{LC}^{T} {KKA}_{LC}) + \frac{1}{L^{2}} (1_{LC}^{T} {KK 1}_{LC})) B

Second:

Σ_{l = 1}^{c} \frac{1}{c_{l}} Σ_{k = 1}^{n_{l}} Σ_{m = 1}^{n_{l}} A_{k, m} {\overset{&OverBar;}{φ}}^{'}_{i}^{T} x_{k} {x_{m}}^{T} {\overset{&OverBar;}{φ}}^{'}_{j} = J_{2} = \frac{1}{L} B (A_{LC}^{T} {KWKA}_{LC} - \frac{1}{L} (A_{LC}^{T} {KWK 1}_{LC})

- \frac{1}{L} (1_{LC}^{T} {KWKA}_{LC}) + \frac{1}{L^{2}} (1_{LC}^{T} {KWK 1}_{LC})) B

W=diag[w in the following formula ₁W _c] be a N * N partitioned matrix, w _iBe that an element is C _i* c _iMatrix, therefore

Φ_{b}^{T} {\tilde{S}}_{w} Φ_{b} = J_{1} - J_{2},

It also is an outer c matrix.Then

Be the simple matrix of a dimension, calculate its proper vector p for (c-1) * (c-1) _iWith eigenvalue ' _iAnd with the ascending order arrangement, m vectorial eigentransformation matrix Q=UP=U[p of getting before extracting ₁P _M-1], wherein 1≤m≤c-1 can get

Q^{T} {\tilde{S}}_{w} Q = Λ_{w},

Λ _w=diag[λ ' ₁λ ' _m] be a m * m diagonal matrix;

Γ = Q Λ_{w}^{- 1 / 2},

3, as claimed in claim 1 or 2 based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method, it is characterized in that: described step 5. in, speaker's phonetic entry pattern z to be classified arbitrarily projects to proper subspace according to Γ, is calculated as follows:

y = Γ^{T} φ (z) = {(E_{m} Λ_{b}^{- 1 / 2} P Λ_{w}^{- 1 / 2})}^{T} (Φ_{b}^{T} φ (z))

Wherein:

Φ_{b}^{T} φ (z) = {[{\overset{&OverBar;}{φ}}^{'}_{1} . . . {\overset{&OverBar;}{φ}}^{'}_{c}]}^{T} φ (z),

φ (z) is input vector z corresponding proper vector in higher dimensional space H, because

{\overset{&OverBar;}{φ}}^{'}_{i} φ (z) = {(\sqrt{\frac{c_{i}}{N}} ({\overset{&OverBar;}{φ}}_{i} - φ))}^{T} φ (z) = \sqrt{\frac{c_{i}}{N}} (\frac{1}{c_{i}} Σ_{m = 1}^{c_{i}} φ_{im}^{T} φ (z) - \frac{1}{N} Σ_{p = 1}^{c} Σ_{q = 1}^{c_{q}} φ_{pq}^{T} φ (z)),

Can get:

Φ_{b}^{T} φ (z) = \frac{1}{\sqrt{N}} B (A_{LC}^{T} γ (φ (z)) - \frac{1}{N} 1_{LC}^{T} γ (φ (z)))

Wherein

γ (φ (z)) = {[φ_{11}^{T} φ (z) | φ_{12}^{T} φ (z) | . . . | φ_{{cc}_{c}}^{T} φ (z)]}^{T}

Be a N * 1 nuclear vector, the proper vector value is:

y = \frac{1}{\sqrt{N}} {(E_{m} Λ_{b}^{- 1 / 2} P Λ_{w}^{- 1 / 2})}^{T} (B (A_{LC}^{T} - \frac{1}{N} 1_{LC}^{T})) γ (φ (z))

4, as claimed in claim 3 based on speaker identification's implementation method of protecting class kernel Fisher diagnostic method, it is characterized in that: described step 1. in, described pre-service comprises sampling, removal noise, end-point detection, pre-emphasis, branch frame and windowing.