CN104966098A

CN104966098A - Data discrimination dimension reducing method of exceptional point inhibition

Info

Publication number: CN104966098A
Application number: CN201510325234.8A
Authority: CN
Inventors: 任传贤
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2015-06-15
Filing date: 2015-06-15
Publication date: 2015-10-07

Abstract

The present invention discloses a data discrimination dimension reducing method of exceptional point inhibition, having the beneficial effects that (1) all data points are supposed to be endowed with weight according to contribution discriminated in a subspace learning process, wherein the data point playing an active role is endowed with large sample weight, in this way, exceptional points can be adaptively attenuated in the subspace learning process; (2) based on given class labels, a mean vector and a covariance matrix of each class are independently estimated, and then a linear discrimination criterion based on new statistics is proposed. And the new mode of sample weighting also can be applied in other algorithms based on the covariance matrix.

Description

The discriminating data dimension reduction method that exceptional point suppresses

Technical field

The present invention relates to data processing field, more specifically, relate to the discriminating data dimension reduction method that a kind of exceptional point suppresses.

Background technology

Method of Data with Adding Windows based on sub-space learning obtains sufficient attention in intellectual analysis and cognitive system.Linear discriminant analysis (LDA) and various improved procedure thereof are because it has the mode of learning of supervision and simple implementation procedure to receive and pays close attention to more widely and study.

But in reality scene, the further application of the drawbacks limit of two aspects LDA and popularization.First, independent identically distributed basic assumption seems too harsh.Those are not met to the data of this basic assumption, just cannot ensure theoretically to obtain optimum solution.And for high dimensional data, how to differentiate the problem that independent same distribution hypothesis is inherently very difficult.Secondly, the data collected in actual environment are often with noise to a certain degree and exceptional point, and their existence will cause subspace sane not, and independent identically distributed Data distribution8 hypothesis makes model with larger error.In both cases, traditional average and covariance matrix estimation method is used to lose the discriminant information of subspace.

Scientific research personnel finds in data modeling and numerical procedure, and partial data plays than the more positive effect of other data in differentiation sub-space learning process.Like this, if do not add the statistic estimation of any differentiation for all data, not only seem reasonable not, also in actual application, have more weak performance.Therefore necessary partial structurtes feature of again refining data, rationally distinguishes data sample, and larger weights given by the samples played a positive role to those, could the discriminant information of containing of more effective mining data.By the combination to Fischer linear discriminant analysis and locality preserving projections basic thought, LFDA can learn out the differentiation subspace with partial structurtes retention performance.Sparse representation method is incorporated into local neighbor sample and portrays process by L1-Graph, thus effectively excavates the sparse expression characteristic between sample, but obtains the subspace contributing to classification on this basis.Professor Xu Yong of Harbin Institute of Technology proposes the LLDA method of two steps.First for any given test sample y, selected one group of relevant (neighbour) sample of y by the method for rarefaction representation in training set, then classical Fischer criterion is performed based on these correlated sampleses, redundant samples can be eliminated further like this, thus reduce computation complexity.Recently, Mu et al. proposes the multi-class Data Dimensionality Reduction problem of adaptive embedding framework process.

It should be noted that above algorithm or method can be summed up as the basic ideas of " relation weighting ".In other words, by to " the neighbor relationships " (neighbour between sample, non-neighbors) again estimate and analysis, relation (having the generic relation of supervision and unsupervised neighbor relationships) between any one group of sample obtains the adjustment based on local geometry, thus more contributes to discriminatory analysis.But a major defect of this kind of algorithm is, if data exist exceptional point to a certain degree, therefore the relation so between exceptional point and normal data points also will amplify, thus impact differentiates the study of subspace.

Summary of the invention

In order to overcome above-mentioned the deficiencies in the prior art, the present invention proposes the discriminating data dimension reduction method that a kind of exceptional point suppresses.The method effectively can solve optimum differentiation subspace, estimating the contribution margin of each sample in learning process, can processing preferably with blocking the data with exceptional point.

To achieve these goals, technical scheme of the present invention is:

The discriminating data dimension reduction method that exceptional point suppresses, comprises the following steps:

S1. input with class label 1,2 ..., C raw data, C be classification sum;

S2. inner in kth classification, 1≤k≤C, obtain wherein each to data point with between relation weights wherein σ is a Study first; Then i-th data point in this classification is obtained and the weights between other data point and weights sum n wherein _krepresent data point number in a kth classification;

S3. the data point of kth class is listed and the weights between all generic data points, the unified weights sum used in S2 step do normalized, obtain final weights k=1,2 ..., C; I=1,2 ..., n _k,

S4. to the data point in a kth classification, such other sample mean vector and covariance matrix is calculated according to respective sample weights:

{\overset{&OverBar;}{x}}^{k} = Σ_{i = 1}^{n_{k}} X_{i}^{k} p_{k} (i),

S_{w}^{k} = Σ_{i = 1}^{n_{k}} (X_{i}^{k} - {\overset{&OverBar;}{x}}^{k}) {(X_{i}^{k} - {\overset{&OverBar;}{x}}^{k})}^{T} p_{k} (i)

S5. to the data in all categories, calculate respectively different classes of between Scatter Matrix

S_{b} =

Σ_{k, l = 1}^{C} ({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l}) {({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l})}^{T}

With Scatter Matrix in the class of all categories

S_{w} = Σ_{k = 1}^{C} S_{w}^{k};

S6. in order to extract spacing and differentiate feature, need to solve optimum orthogonal intersection cast shadow matrix A, make it meet S _ba=λ S _wa.This process can be converted into matrix decomposition, wherein s _winverse matrix, A is m × (C-1) dimensional matrix to be solved, and λ is the diagonal matrix be made up of eigenwert.

Further, described step S6 comprises:

1) S _w← S _w+ ρ I, wherein ρ is a very little positive number, I representation unit matrix;

2) S is solved _winverse matrix order

3) S is decomposed into Q Σ Q ^tform, wherein Q is the orthogonal matrix of m × m, and Σ is the diagonal matrix of m × m, its diagonal element be nonnegative real number and by from big to small order arrangement;

4) the front C-1 row getting matrix Q form new matrix A.

Compared with prior art, beneficial effect of the present invention is: (1) all data points all suppose that the data point wherein playing positive role gives larger samples weights according to differentiating that weights are given in the contribution in sub-space learning process; Accordingly, exceptional point will be decayed adaptively in sub-space learning process.

(2) based on given class label, independent estimations goes out mean vector and the covariance matrix of each classification, then proposes the linear decision rule based on new statistic; The new model of this sample weighting also may be used for other based in the middle of the algorithm of covariance matrix.

The present invention proposes new sample weighting method and Method of Data with Adding Windows, differentiating the sane performance of subspace for improving, in suppression noise and exceptional point etc., have very important effect and application space widely.

Accompanying drawing explanation

Fig. 1 is conventional neighbor relationships schematic diagram.

Fig. 2 is new average estimation model schematic.

Fig. 3 is the process flow diagram of the inventive method.

Embodiment

Below in conjunction with accompanying drawing, the present invention will be further described, but embodiments of the present invention are not limited to this.

Fig. 1 is conventional figure incorporation model, and other all data point of same class all gives identical weights.

Fig. 2 is the average estimation based on importance sampling, and wherein filled circles represents the significant data point of the larger weights of imparting, and they are for estimating the weighted mean in class.

Fig. 3 is the process flow diagram of the inventive method, wherein comprise data input, weights estimation, Estimation of Mean, in class/main process such as between class scatter Matrix Estimation, subspace calculating.

S1. input with class label 1,2 ..., C raw data, C be classification sum;

S4. to the data point of kth class, such other sample mean vector and covariance matrix is calculated according to above-mentioned sample weights:

{\overset{&OverBar;}{x}}^{k} = Σ_{i = 1}^{n_{k}} X_{i}^{k} p_{k} (i),

S_{w}^{k} = Σ_{i = 1}^{n_{k}} (X_{i}^{k} - {\overset{&OverBar;}{x}}^{k}) {(X_{i}^{k} - {\overset{&OverBar;}{x}}^{k})}^{T} p_{k} (i)

S_{b} =

Σ_{k, l = 1}^{C} ({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l}) {({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l})}^{T}

With Scatter Matrix in the class of all categories

S_{w} = Σ_{k = 1}^{C} S_{w}^{k};

S6. in order to extract spacing and differentiate feature, need to solve optimum orthogonal intersection cast shadow matrix A and make it meet S _ba=λ S _wa.This process can be converted into matrix decomposition, wherein s _winverse matrix, A is m × (C-1) dimensional matrix to be solved, and λ is the diagonal matrix be made up of eigenwert.

Further, described step S6 comprises:

2) S is solved _winverse matrix order

4) the front C-1 row getting matrix Q form new matrix A.

Of the present inventionly to be specially:

1. basic statistics amount (in class in average, class Scatter Matrix, between class scatter matrix) is estimated

N the data { (x having supposed m-dimensional space given _i, b _i) | x _i∈ R ^m, i=1,2 ..., n}, b _i∈ 1,2 ..., C} is x _iclass label, C is the classification number of data.According to mathematical statistics knowledge, if the probability density function of data is p (x), then the Mean Matrix of sample point and covariance matrix are estimated by following formula respectively:

\overset{&OverBar;}{x} = Σ_{i = 1}^{n} x_{i} p (x_{i}),

Cov = Σ_{i = 1}^{n} (x_{i} - \overset{&OverBar;}{x}) {(x_{i} - \overset{&OverBar;}{x})}^{T} p (x_{i}),

P (x wherein _i) be often difficult to determine its actual value, can be understood as data point x _iin the significance level of regional area, also can be understood as the contribution margin calculating class center.The present invention proposes one and estimates p (x _i) new method, and for estimate at each classification of class mark data class in average, Scatter Matrix and between class scatter matrix in class, thus obtain the Data Dimensionality Reduction new algorithm that exceptional point suppresses.

General, suppose d=||x _i-x _j|| be x _iwith x _jbetween Euclidean distance, r is a scale parameter, represent x _iwith x _jbetween similarity measure, then W forms a n × n matrix.D is made to be a n-dimensional vector, and then D (i) illustrates sample x _iand the similarity sum between other all sample point.Order the present invention will use p (x _i) as sample point x _isolving the contribution margin/weights differentiated in the process of subspace.

Suppose the data matrix of kth class, n _kthe sample size of kth class, represent X ^ki-th data, p _ki () is weights, then X ^kmean vector can be designated as

{\overset{&OverBar;}{x}}^{k} = Σ_{i = 1}^{n_{k}} X_{i}^{k} p_{k} (i) = Σ_{i = 1}^{n_{k}} (X_{i}^{k} D^{k} (i) / Σ_{j = 1}^{n_{k}} D^{k} (j)),

Be updated to x can be calculated ^kscatter Matrix in corresponding class.

Suppose A ∈ R ^{m × h}projection matrix (acquiescence h=C-1) to be solved, then feature after dimensionality reduction is x ^kin class after dimensionality reduction, average and covariance matrix can be expressed as:

{\overset{&OverBar;}{y}}^{k} = Σ_{i = 1}^{n_{k}} A^{T} X_{i}^{k} p_{k} (i) = A^{T} {\overset{&OverBar;}{x}}^{k},

S_{w}^{k} = Σ_{i = 1}^{n_{k}} [(y_{i} - {\overset{&OverBar;}{y}}^{k}) {(y_{i} - {\overset{&OverBar;}{y}}^{k})}^{T} p_{k} (i)] = A^{T} [Σ_{i = 1}^{n_{k}} (X_{i}^{k} - {\overset{&OverBar;}{x}}^{k}) {(X_{i}^{k} - {\overset{&OverBar;}{x}}^{k})}^{T} p_{k} (i)] A .

Due to

p_{k} (i) = p (x_{i}^{k}) = D_{k} (i) / Σ_{j = 1}^{n_{k}} D_{k} (j),

We are by vectorial p _kconvert diagonal matrix to, obtain diag (p _k)=D _k/ trace (D _k) and

Σ_{i = 1}^{n_{k}} p_{k} (i) = 1 .

If

E_{k} = diag (p_{k}) - p_{k} p_{k}^{T},

Then the covariance matrix of kth class can be reduced to

S_{w}^{k} = A^{T} X^{k} E_{k} {X^{k}}^{T} A ~ A^{T} T_{w}^{k} A .

If X=is [X ¹, X ²..., X ^c] represent that we are by matrix E corresponding for each classification by the data matrix of the composition of sample of whole C classification _kbe stitched together according to diagonal way and form weight matrix W _w, then in the class of whole classification, Scatter Matrix sum is expressed as:

S_{w} = Σ_{k = 1}^{C} S_{w}^{k} = Σ_{k = 1}^{C} A^{T} X^{k} E_{k} {X^{k}}^{T} A = A^{T} X W_{w} X^{T} A ~ A^{T} T_{w} A .

On the other hand, the weighted mean vector due to each classification represents such other center, so they can be used for Scatter Matrix between compute classes:

S_{b} = Σ_{k, l = 1}^{C} ({\overset{&OverBar;}{y}}^{k} - {\overset{&OverBar;}{y}}^{l}) {({\overset{&OverBar;}{y}}^{k} - {\overset{&OverBar;}{y}}^{l})}^{T} = Σ_{k, l = 1}^{C} A^{T} ({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l}) {({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l})}^{T} A ~ A^{T} T_{b} A .

2. linear dimensionality reduction model

In order to obtain optimum low-dimensional discriminant space, the present invention selects Fischer business as basic model.Thus, objective function can be expressed as:

\max_{A^{T} A = I} \frac{tr (S_{b})}{tr (S_{w})} = \max_{A^{T} A = I} \frac{tr (A^{T} T_{b} A)}{tr (A^{T} T_{w} A)},

Wherein tr represents the trace operator in linear algebra.This can be solved by a conventional matrix decomposition problem: S _wa=S _ba Δ, wherein Δ represent by the diagonal matrix that forms as diagonal element of front h eigenvalue of maximum, and A is the orthogonal matrix be made up of its characteristic of correspondence vector.Usually, the high dimension of data can cause S _wirreversible, this can make troubles to above-mentioned optimization problem, therefore needs regularization method (as S _w← S _w+ ρ I, wherein ρ is a very little positive number, I representation unit matrix) ensure S _wreversibility.

It is worth mentioning that, this basic model can also be applied to other the criterion based on interval, such as

\max_{A^{T} A = I} tr (A^{T} (T_{b} - T_{w}) A) .

In the stage that feature embeds, suppose x _tbe test sample book, only need to obtain the character representation after dimensionality reduction by compute matrix projection: y _t← A ^tx _t.

Above-described embodiments of the present invention, do not form limiting the scope of the present invention.Any amendment done within spiritual principles of the present invention, equivalent replacement and improvement etc., all should be included within claims of the present invention.

Claims

1. a discriminating data dimension reduction method for exceptional point suppression, is characterized in that, comprise the following steps:

S1. input with class label 1,2 ..., C raw data, C be classification sum;

S2. inner in kth classification, 1≤k≤C, obtain wherein each to data point with between relation weights wherein σ is a Study first; Then i-th data point in a kth classification is obtained and the weights in this classification between other data point and weights sum n wherein _krepresent data point number in a kth classification;

S4. to the data point in a kth classification, such other sample mean vector and covariance matrix is calculated according to each sample weights:

{\overset{&OverBar;}{x}}^{k} = Σ_{i = 1}^{n_{k}} X_{i}^{k} p_{k} (i),

S_{w}^{k} = Σ_{i = 1}^{n_{k}} (X_{i}^{k} - {\overset{&OverBar;}{x}}^{k}) {(X_{i}^{k} - {\overset{&OverBar;}{x}}^{k})}^{T} p_{k} (i)

S_{b} =

Σ_{k, l = 1}^{C} ({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l}) {({\overset{&OverBar;}{x}}^{k} - {\overset{&OverBar;}{x}}^{l})}^{T}

With Scatter Matrix in the class of all categories

S_{w} = Σ_{k = 1}^{C} S_{w}^{k};

S6. in order to extract spacing and differentiate feature, need to solve optimum orthogonal intersection cast shadow matrix A, make it meet S _ba=λ S _wthis process of A can be converted into matrix decomposition, wherein s _winverse matrix, A is m × (C-1) dimensional matrix to be solved, and λ is the diagonal matrix be made up of eigenwert.

2. the discriminating data dimension reduction method of exceptional point suppression according to claim 1, it is characterized in that, described step S6 comprises:

2) S is solved _winverse matrix order

S = S_{w}^{- 1} S_{b};

4) the front C-1 row getting matrix Q form new matrix A.