CN102982342A

CN102982342A - Positive semidefinite spectral clustering method based on Lagrange dual

Info

Publication number: CN102982342A
Application number: CN2012104456029A
Authority: CN
Inventors: 严严; 沈华森; 王菡子
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2012-11-08
Filing date: 2012-11-08
Publication date: 2013-03-20
Anticipated expiration: 2032-11-08
Also published as: CN102982342B

Abstract

The invention provides a positive semidefinite spectral clustering method based on a Lagrange dual, and relates to a spectral clustering method. A, principal component analysis in given sample data is adopted to conduct dimensionality reduction. B, similarity matrix forms sample data based on an omnibearing connection diagram, and similarity between the sample data is used based on weighted sum and a method of a Gaussian kernel function and a polynomial kernel function. C, positive semidefinite of similarity matrix is conducted based on Lagrange dual property. D, singular value decomposition of normalization matrix is conducted, and then the matrix which is formed by feature vector corresponding to the front K eigenvalue of maximum is obtained. E, a traditional K mean value cluster or other cluster methods is adopted to conduct clustering analysis on the feature vector matrix and obtain a final clustering result. All data can be effectively clustered, people know by algorithm analysis that compared with a common positive semidefinite spectral clustering method, not only be precision of spectral clustering can improved, but also required time for the spectral clustering can be greatly reduced .

Description

Positive semidefinite Spectral Clustering based on Lagrange duality

Technical field

The present invention relates to a kind of Spectral Clustering, particularly relate to a kind of positive semidefinite Spectral Clustering based on Lagrange duality.

Background technology

Cluster analysis is one of analysis of statistical data and the most popular technology of process field.Graphical analysis, pattern-recognition, machine learning and information retrieval etc. have been widely used at present.The target of cluster analysis is effectively to distinguish the data category that data centralization is different (be called " bunch "), make the similarity of data in same cluster large, and between different cluster data, the similarity of data is little.In recent years, Spectral Clustering fast development become the effective clustering technique of a class.Spectral Clustering is based upon on the basis of spectral graph theory, mainly utilizes the proper vector of the similarity matrix of data set to carry out effective cluster.With traditional clustering method (as k-mean cluster etc.), compare, Spectral Clustering has many good qualities, and it is not only realized simply, irrelevant with dimension, and can be on the data of arbitrary shape distribute cluster converge on overall optimum solution, therefore be widely used.

Spectral Clustering is regarded cluster as a figure segmentation problem.Generally, graph model (similarity matrix) based on all data points of Spectral Clustering model, the limit in figure is used for characterizing the distance between different pieces of information point.Then under specific error metrics, similarity matrix is being carried out to normalization, the simple clustering method of low dimension data utilization (as the k mean cluster) finally the Eigenvalues Decomposition of normalization matrix obtained carries out effective cluster.In Spectral Clustering, the foundation of similarity matrix and normalization are the key factors of the final spectral clustering performance of impact.

A given similarity matrix, the simplest figure dividing method solves minimal cut (min-cut) problem exactly.The purpose of minimal cut be the weight that minimizes the limit of subgraph (be in subgraph the similarity between sample and).Yet, due to the not restriction of subgraph size, the cluster result of minimal cut is ineffective usually.By introducing the constraint condition of subgraph size in the problem of cutting apart at figure, ratio cuts (Ratio-Cut) (P.K.Chan, M.D.F.Schlag, and J.Y.Zien, " Spectral k-way ratiocutpartitioning and clustering, " IEEE Trans.Comput.-Aided Des.Integr.Circuits Syst., vol.13, no.9.pp.1088-1096, 1994.) and normalized cut (Normalized-Cut) (J.Shi and J.Malik, " Normalized cutsand image segmentation, " IEEE Trans.Pattern Anal.Mach.Intell., vol.22, no.8, pp.888-905, 2000.) can effectively address this problem.Ratio cuts and the essence of normalized cut is all to wish to reach the balanced division between different bunches.

Theory proves, and it is the normalized difference of similarity matrix that ratio cuts with the key difference of normalized cut.Similarity matrix normalization is actually a process of finding two random (doubly-stochastic) matrixes (non-negative, symmetry and F1=1).Different spectral clusterings is actual can regard a kind of being similar to of finding under different error metrics as.Ratio cuts and is based on L ₁normalization under error measure, and normalized cut be based on relative entropy (also referred to as the Kullback-Leibler divergence) normalization under error measure.The people such as Zass (R.Zass and A.Shashua, " Doubly stochastic normalization forspectral clustering; " in Proc.Adv.Neural Inf.Process.Syst., Vancouver, B.C., Canada, 2006, pp.1569 – 1576.) a kind of effective method for normalizing (being called for short the FSC method) of finding doubly stochastic matrix based on the Frobenius norm has been proposed.The FSC method is considered to effective a kind of similarity matrix method for normalizing at present usually.Yet the normalized subject matter of Frobenius is positive semidefinite constraint condition to be left in the basket.This makes the doubly stochastic matrix that the method obtains be inaccurate.On the other hand, adding the Frobenius normalization optimization problem of positive semidefinite condition is a kind of positive semidefinite planning problem.In order to solve this optimization problem, the efficiency of traditional derivation algorithm based on interior point method is very low, and time complexity is up to (O (n ^6.5)), wherein n is the number of data, can only be applied on small-scale data set.

Summary of the invention

The object of the present invention is to provide and can utilize easily existing Eigenvalues Decomposition and gradient descent method to solve the positive semidefinite planning problem, can in polynomial time, find globally optimal solution, and its time complexity is only (O (t ^.n ³), wherein t is iterations, approximately 250 times usually) the positive semidefinite Spectral Clustering based on Lagrange duality.

Technical scheme of the present invention is, at first given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA), then the similarity matrix of the method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between similarity.Then based on Lagrange duality confrontation similarity matrix, carrying out positive semidefinite normalization solves.Normalization matrix is carried out to svd, obtain the matrix that front k eigenvalue of maximum characteristic of correspondence vector forms.Finally utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix and obtain final cluster result.

Positive semidefinite Spectral Clustering based on Lagrange duality of the present invention comprises the following steps:

1) given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA);

2) similarity matrix of the method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between similarity;

3) carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix solves;

4) normalization matrix is carried out to svd, obtain the matrix that front k eigenvalue of maximum characteristic of correspondence vector forms;

5) utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix and obtain final cluster result.

In step 1), the described concrete grammar that given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA) is as follows:

Given sample data collection { (a ₁..., a _n) | a _i∈ R ^m, i=1 ..., n}, wherein a _ithe proper vector that means i sample data; The dimension of each proper vector is that M(M is natural number); The number that n is sample (n is natural number); Sample data integrate the class number that comprises as k(k as natural number).Adopt principal component analysis (PCA) to carry out dimensionality reduction to the sample data collection.Sample data collection after dimensionality reduction is { (b ₁..., b _n) | b _i∈ R ^p, i=1 ..., n}, wherein b _ithe proper vector of i sample data after the expression dimensionality reduction; After dimensionality reduction, the dimension of each proper vector is that P and P are natural number.

In step 2) in, the similarity matrix of the described method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between the concrete grammar of similarity as follows:

Utilize the method for omnidirectional's connection layout to construct the similarity matrix K={K of sample data collection _ij} _{n * n}.The weight-sum method of employing based on gaussian kernel function and polynomial kernel function come computational data between similarity, specific formula for calculation is as follows,

K_{ij} = K (b_{i}, b_{j}) = {αexp}^{- \frac{{| | b_{i} - b_{j} | |}^{2}}{σ^{2}}} + (1 - α) {(b_{i}^{T} b_{j} + 1)}^{d}

B wherein _iand b _jthe proper vector that means respectively i sample data and j sample data after dimensionality reduction; K _ijmean b _iand b _jbetween measuring similarity; Parameter δ in gaussian kernel function controls the width between the sample neighborhood; Parameter d in the polynomial kernel function is for controlling the degree of crook of polynomial curve; Parameter alpha ∈ [0,1] is for regulating two kinds of weight proportions between different kernel functions;

The similarity function of described sample data is for calculating the extent of polymerization between each sample data point.

In step 3), described to carry out based on Lagrange duality confrontation similarity matrix the concrete grammar that positive semidefinite normalization solves as follows:

Thereby carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K solves and obtains matrix the optimization problem solved is as follows:

\hat{K} = \arg \min_{F} {| | K - F | |}_{F}^{2}

s.t.F≥0,F1＝1,F＝F ^T,F±0

Wherein

it is the square formation of n * n; F is the square formation of n * n; F>=0 means that F is nonnegative matrix (in matrix, all elements is non-negative); 1 ∈ R ⁿmean that each element in vector is 1; F ± 0 means that F is positive semidefinite matrix; || .|| _fmean the Frobenius norm;

Describedly carry out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K and solve, obtain matrix

it comprises following sub-step:

(1) initialization n-dimensional vector u is 1, and wherein each element in 1 is 1; Initialization n * n square formation Q is unit matrix I.

(2) calculate n sample number strong point with respect to u _i(i=1 ..., gradient output n), computing formula is as follows:

g (u_{i}) = - 2 - < P_{-}, {\hat{T}}_{i} >, i = 1, . . ., n

Wherein<.. the inner product between two n * n square formation meaned; P=-(Q+M+K), wherein M=u1 ^t+ 1u ^t; P _-mean that the eigenwert in P is the negative matrix formed, λ wherein _i∈ R, i=1 ..., n and x _i∈ R ⁿ, (i=1 ..., n) be respectively eigenwert and the proper vector of square formation P.

(3) adopt the algorithm based on Gradient Descent, utilize the gradient that step C2 calculates to be solved following optimization problem, obtain optimum solution u.

\min_{u} \frac{1}{2} {| | P_{-} | |}_{F}^{2} 21^{T} u

(4) utilize formula Z=P ₊calculate Z, and utilize formula Q=th _>=0(X) calculate Q,

P wherein _-mean the matrix of eigenwert for just forming in P, th _>=0(.) means the operation of the non-positive element zero clearing in matrix; And X=-(Z+M+K).

(5) judge whether to meet end condition.If do not meet, return to step C2 and carry out next step iteration.If meet, enter C6.

(6) utilize formula

\hat{K} = K + Q^{*} + u^{*} 1^{T} + {1 u}^{* T} + Z^{*}

Calculate

Q wherein ^*, u ^*and Z ^*be respectively the optimal value obtained in above-mentioned iterative process.

In step 4), described normalization matrix is carried out to svd, the concrete grammar that obtains the matrix that front k eigenvalue of maximum characteristic of correspondence vector form is as follows:

To normalization matrix carry out svd, obtain the vectorial matrix W formed of front k eigenvalue of maximum characteristic of correspondence=[u ₁..., u _k] ∈ R ^{n * k}, specific formula for calculation is as follows:

\hat{K} = Σ_{i = 1}^{n} κ_{i} u_{i} u_{i}^{T}

κ wherein _iand u _iit is respectively normalization matrix

by i maximum eigenwert and the characteristic of correspondence vector obtained after svd;

To normalization matrix

thereby carry out the effectively low dimension data that svd obtains strengthening dispersion between the interior degree of polymerization of class and class.

In step 5), described utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix to obtain the concrete grammar of final cluster result as follows:

Utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W, obtain final cluster result;

Utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W and obtain final cluster result, it comprises following sub-step:

(1) using the every a line in the W matrix as a proper vector, the proper vector a that original M ties up _i, i=1 ..., n is transformed into the proper vector w of k dimension _i, i=1 ..., n.

(2) utilize traditional k mean cluster or other clustering methods sample data the collection { (w to low-dimensional ₁..., w _n) | w _i∈ R ^k, i=1 ..., n} carries out cluster analysis, thereby obtains cluster result.

The present invention proposes a kind of normalized Spectral Clustering of the Frobenius based on positive semidefinite constraint condition rapidly and efficiently, it has utilized Lagrangian Duality to solve the positive semidefinite planning problem.The method is called on the normalization similarity matrix that meets at random two and positive semidefinite constraint condition finding, and its optimization aim is a protruding optimization problem.Based on Lagrange duality matter, the present invention can utilize existing Eigenvalues Decomposition and gradient descent method to solve the positive semidefinite planning problem easily.The method that the present invention proposes can find globally optimal solution in polynomial time, and its time complexity is only (O (t ^.n ³)), wherein t is iterations (approximately 250 times usually).

The accompanying drawing explanation

Fig. 1 is on the UCI data set, the method that the present invention proposes and cluster result (the minimum cluster error rate) comparison diagram of additive method.

Fig. 2 be the method that proposes of the present invention from the traditional positive semidefinite method for solving based on interior point method in different cluster number of samples situations working time comparison diagram.In Fig. 2, the number that horizontal ordinate is data sample, ordinate is CPU working time.

Embodiment

Below in conjunction with drawings and Examples, method of the present invention is elaborated, the present embodiment is implemented take technical solution of the present invention under prerequisite, provided embodiment and specific operation process, but protection scope of the present invention is not limited to following embodiment.

The embodiment of the embodiment of the present invention comprises the following steps:

S1. given sample data collection { (a ₁..., a _n) | a _i∈ R ^m, i=1 ..., n}.A _ithe proper vector that means i sample data; The dimension of each proper vector is that M(M is natural number); The number that n is sample (n is natural number) and the order of magnitude are 10 ³above.Sample data integrate the class number that comprises as k(k as natural number) the general category number is between 1 ~ 100.Adopt principal component analysis (PCA) to carry out dimensionality reduction to data.Sample data collection after dimensionality reduction is { (b ₁..., b _n) |, b _i∈ R ^p, i=1 ..., n} is b wherein _ithe feature that means i sample data after dimensionality reduction to; After dimensionality reduction, the dimension of each proper vector is that P and P are natural number.

S2. the method based on omnidirectional's connection layout is constructed the similarity matrix K={K of sample data collection _ij} _{n * n}.Similarity between two sample datas is calculated the weight-sum method adopted based on gaussian kernel function and polynomial kernel function, and specific formula for calculation is as follows:

K_{ij} = K (b_{i}, b_{j}) = {αexp}^{- \frac{{| | b_{i} - b_{j} | |}^{2}}{σ^{2}}} + (1 - α) {(b_{i}^{T} b_{j} + 1)}^{d}

B wherein _iand b _jmean respectively the proper vector of i sample data and j sample data; K _ijmean b _iand b _jbetween measuring similarity; Parameter δ in gaussian kernel function controls the width between the sample neighborhood, generally selects between 1 ~ 1000; Parameter d in the polynomial kernel function is controlled the degree of crook of polynomial curve; Parameter alpha ∈ [0,1] regulates the weight proportion of two kinds of different kernel functions, generally selects 0.3 ~ 0.5 left and right.

S3. carry out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K, thereby obtain matrix

the optimization problem solved is defined as follows:

\hat{K} = \arg \min_{F} {| | K - F | |}_{F}^{2}

s.t.F≥0,F1＝1,F＝F ^T,F±0

Wherein

it is the square formation of n * n; F is the square formation of n * n; F>=0 means that F is nonnegative matrix (in matrix, all elements is non-negative); 1 ∈ R ⁿmean that each element in vector is 1; F ± 0 means that F is positive semidefinite matrix; || .|| _fmean the Frobenius norm.

Specifically comprise: initialization n-dimensional vector u is 1, and wherein each element in 1 is 1; Initialization n * n square formation Q is unit matrix I.

Calculate n sample number strong point with respect to u _i(i=1 ..., gradient output n), the following institute of computing formula:

g (u_{i}) = - 2 - < P_{-}, {\hat{T}}_{i} >, i = 1, . . ., n

Wherein<.. the inner product between two n * n square formation meaned; P=-(Q+M+K), wherein M=u1 ^t+ 1u ^t; P _-mean that the eigenwert in P is the negative matrix formed,

λ wherein _i∈ R, i=1 ..., n and x _i∈ R ⁿ, (i=1 ..., n) be respectively eigenwert and the proper vector of square formation P.

The algorithm of employing based on Gradient Descent, such as the L-BFGS-B algorithm, the gradient based on calculating in above-mentioned steps is solved following optimization problem, obtains optimum solution u.

\min_{u} \frac{1}{2} {| | P_{-} | |}_{F}^{2} 21^{T} u

Utilize formula Z=P ₊calculate Z, and utilize formula Q=th _>=0(X) calculate Q,

P wherein _{_}mean the matrix of eigenwert for just forming in P,

th _>=0(.) means the operation of the non-positive element zero clearing in matrix; And X=-(Z+M+K).

Judge whether to meet end condition, whether the output valve of objective function changes that (relative error of the output valve that the output valve of the objective function obtained such as last iteration and this iteration obtain is 10 ^-3) or iterations whether reach restriction (such as 300 iteration).If do not meet, return to the gradient output of calculating n sample number strong point in above-mentioned steps.If meet, enter next step.

Utilize formula

\hat{K} = K + Q^{*} + u^{*} 1^{T} + {1 u}^{* T} + Z^{*}

Calculate

S4. to normalization matrix

carry out svd, obtain the vectorial matrix W formed of front k eigenvalue of maximum characteristic of correspondence=[u ₁..., u _k] ∈ R ^{n * k}, specific formula for calculation is as follows:

\hat{K} = Σ_{i = 1}^{n} κ_{i} u_{i} u_{i}^{T}

K wherein _iand u _iit is respectively normalization matrix by i maximum eigenwert and the characteristic of correspondence vector obtained after svd.

S5. utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W and obtain final cluster result.

Specifically comprise: using the every a line in the W matrix as a proper vector, the proper vector a that original M ties up _i, i=1 ..., n is transformed into the proper vector w of k dimension _i, i=1 ..., n.

Utilize traditional k mean cluster or other clustering methods sample data the collection { (w to low-dimensional ₁..., w _n) | w _i∈ R ^k, i=1 ..., n} carries out cluster analysis and obtains cluster result.

Fig. 1 is given on the UCI data set, the method that the present invention proposes and cluster result (the minimum cluster error rate) comparison diagram of additive method.In Fig. 1, a is Ratio-Cut, and b is Normalized-Cut, and c is FSC, and d is the method that the present invention proposes.

Fig. 2 provide method that the present invention proposes and the traditional positive semidefinite method for solving based on interior point method in different cluster number of samples situations working time comparison diagram.In Fig. 2, the method that the corresponding the present invention of curve 1 proposes; The positive semidefinite method for solving based on interior point method that curve 2 is corresponding traditional.

Claims

1. the positive semidefinite Spectral Clustering based on Lagrange duality is characterized in that comprising the following steps:

2. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, is characterized in that in step 1), and the described concrete grammar that given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA) is as follows:

Given sample data collection { (a ₁..., a _n) | a _i∈ R ^m, i=1 ..., n}, wherein a _ithe proper vector that means i sample data; The dimension of each proper vector is M, and M is natural number; The number that n is sample, n is natural number; Sample data integrates the class number that comprises as k, and k is natural number; Adopt principal component analysis (PCA) to carry out dimensionality reduction to the sample data collection, the sample data collection after dimensionality reduction is { (b ₁..., b _n) | b _i∈ R ^p, i=1 ..., n}, wherein b _ithe proper vector of i sample data after the expression dimensionality reduction; After dimensionality reduction, the dimension of each proper vector is that P and P are natural number.

3. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, it is characterized in that in step 2) in, the similarity matrix of the described method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between the concrete grammar of similarity as follows:

Utilize the method for omnidirectional's connection layout to construct the similarity matrix K={K of sample data collection _ij} _{n * n}, adopt weight-sum method based on gaussian kernel function and polynomial kernel function come computational data between similarity, specific formula for calculation is as follows,

K_{ij} = K (b_{i}, b_{j}) = {αexp}^{- \frac{{| | b_{i} - b_{j} | |}^{2}}{σ^{2}}} + (1 - α) {(b_{i}^{T} b_{j} + 1)}^{d}

B wherein _iand b _jthe proper vector that means respectively i sample data and j sample data after dimensionality reduction; K _ijmean b _iand b _jbetween measuring similarity; Parameter δ in gaussian kernel function controls the width between the sample neighborhood; Parameter d in the polynomial kernel function is for controlling the degree of crook of polynomial curve; Parameter alpha ∈ [0,1] is for regulating two kinds of weight proportions between different kernel functions.

4. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 3, is characterized in that the similarity function of described sample data is for calculating the extent of polymerization between each sample data point.

5. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, is characterized in that in step 3), and described to carry out based on Lagrange duality confrontation similarity matrix the concrete grammar that positive semidefinite normalization solves as follows:

Thereby carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K solves and obtains matrix

the optimization problem solved is as follows:

\hat{K} = \arg \min_{F} {| | K - F | |}_{F}^{2}

s.t.F≥0,F1＝1,F＝F ^T,F±0

Wherein

it is the square formation of n * n; F is the square formation of n * n; F>=0 means that F is nonnegative matrix, and in matrix, all elements is non-negative; 1 ∈ R ⁿmean that each element in vector is 1; F ± 0 means that F is positive semidefinite matrix; || .|| _fmean the Frobenius norm.

6. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 5, is characterized in that describedly carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K and solving, and obtains matrix

it comprises following sub-step:

(1) initialization n-dimensional vector u is 1, and wherein each element in 1 is 1; Initialization n * n square formation Q is unit matrix I;

g (u_{i}) = - 2 - < P_{-}, {\hat{T}}_{i} >, i = 1, . . ., n

Wherein<.. the inner product between two n * n square formation meaned; P=-(Q+M+K), wherein M=u1 ^t+ 1u ^t; P _{_}mean that the eigenwert in P is the negative matrix formed,

λ wherein _i∈ R, i=1 ..., n and x _i∈ R ⁿ, (i=1 ..., n) be respectively eigenwert and the proper vector of square formation P;

(3) adopt the algorithm based on Gradient Descent, utilize the gradient that step C2 calculates to be solved following optimization problem, obtain optimum solution u;

\min_{u} \frac{1}{2} {| | P_{-} | |}_{F}^{2} 21^{T} u

(4) utilize formula Z=P ₊calculate Z, and utilize formula Q=th _>=0(X) calculate Q, wherein P _{_}mean the matrix of eigenwert for just forming in P,

th _>=0(.) means the operation of the non-positive element zero clearing in matrix; And X=-(Z+M+K);

(5) judge whether to meet end condition.If do not meet, return to step C2 and carry out next step iteration.If meet, enter C6;

(6) utilize formula

\hat{K} = K + Q^{*} + u^{*} 1^{T} + {1 u}^{* T} + Z^{*}

Calculate

7. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, it is characterized in that in step 4), described normalization matrix is carried out to svd, the concrete grammar that obtains the matrix that front k eigenvalue of maximum characteristic of correspondence vector form is as follows:

To normalization matrix

\hat{K} = Σ_{i = 1}^{n} κ_{i} u_{i} u_{i}^{T}

κ wherein _iand u _iit is respectively normalization matrix

To normalization matrix thereby carry out the effectively low dimension data that svd obtains strengthening dispersion between the interior degree of polymerization of class and class.

8. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, it is characterized in that in step 5), described utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix to obtain the concrete grammar of final cluster result as follows:

Utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W, obtain final cluster result.

9. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 8, is characterized in that utilizing traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W and obtain final cluster result, and it comprises following sub-step:

(1) using the every a line in the W matrix as a proper vector, the proper vector a that original M ties up _i, i=1 ..., n is transformed into the proper vector w of k dimension _i, i=1 ..., n;