CN102982342A - Positive semidefinite spectral clustering method based on Lagrange dual - Google Patents

Positive semidefinite spectral clustering method based on Lagrange dual Download PDF

Info

Publication number
CN102982342A
CN102982342A CN2012104456029A CN201210445602A CN102982342A CN 102982342 A CN102982342 A CN 102982342A CN 2012104456029 A CN2012104456029 A CN 2012104456029A CN 201210445602 A CN201210445602 A CN 201210445602A CN 102982342 A CN102982342 A CN 102982342A
Authority
CN
China
Prior art keywords
matrix
sample data
positive semidefinite
similarity
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104456029A
Other languages
Chinese (zh)
Other versions
CN102982342B (en
Inventor
严严
沈华森
王菡子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201210445602.9A priority Critical patent/CN102982342B/en
Publication of CN102982342A publication Critical patent/CN102982342A/en
Application granted granted Critical
Publication of CN102982342B publication Critical patent/CN102982342B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a positive semidefinite spectral clustering method based on a Lagrange dual, and relates to a spectral clustering method. A, principal component analysis in given sample data is adopted to conduct dimensionality reduction. B, similarity matrix forms sample data based on an omnibearing connection diagram, and similarity between the sample data is used based on weighted sum and a method of a Gaussian kernel function and a polynomial kernel function. C, positive semidefinite of similarity matrix is conducted based on Lagrange dual property. D, singular value decomposition of normalization matrix is conducted, and then the matrix which is formed by feature vector corresponding to the front K eigenvalue of maximum is obtained. E, a traditional K mean value cluster or other cluster methods is adopted to conduct clustering analysis on the feature vector matrix and obtain a final clustering result. All data can be effectively clustered, people know by algorithm analysis that compared with a common positive semidefinite spectral clustering method, not only be precision of spectral clustering can improved, but also required time for the spectral clustering can be greatly reduced .

Description

Positive semidefinite Spectral Clustering based on Lagrange duality
Technical field
The present invention relates to a kind of Spectral Clustering, particularly relate to a kind of positive semidefinite Spectral Clustering based on Lagrange duality.
Background technology
Cluster analysis is one of analysis of statistical data and the most popular technology of process field.Graphical analysis, pattern-recognition, machine learning and information retrieval etc. have been widely used at present.The target of cluster analysis is effectively to distinguish the data category that data centralization is different (be called " bunch "), make the similarity of data in same cluster large, and between different cluster data, the similarity of data is little.In recent years, Spectral Clustering fast development become the effective clustering technique of a class.Spectral Clustering is based upon on the basis of spectral graph theory, mainly utilizes the proper vector of the similarity matrix of data set to carry out effective cluster.With traditional clustering method (as k-mean cluster etc.), compare, Spectral Clustering has many good qualities, and it is not only realized simply, irrelevant with dimension, and can be on the data of arbitrary shape distribute cluster converge on overall optimum solution, therefore be widely used.
Spectral Clustering is regarded cluster as a figure segmentation problem.Generally, graph model (similarity matrix) based on all data points of Spectral Clustering model, the limit in figure is used for characterizing the distance between different pieces of information point.Then under specific error metrics, similarity matrix is being carried out to normalization, the simple clustering method of low dimension data utilization (as the k mean cluster) finally the Eigenvalues Decomposition of normalization matrix obtained carries out effective cluster.In Spectral Clustering, the foundation of similarity matrix and normalization are the key factors of the final spectral clustering performance of impact.
A given similarity matrix, the simplest figure dividing method solves minimal cut (min-cut) problem exactly.The purpose of minimal cut be the weight that minimizes the limit of subgraph (be in subgraph the similarity between sample and).Yet, due to the not restriction of subgraph size, the cluster result of minimal cut is ineffective usually.By introducing the constraint condition of subgraph size in the problem of cutting apart at figure, ratio cuts (Ratio-Cut) (P.K.Chan, M.D.F.Schlag, and J.Y.Zien, " Spectral k-way ratiocutpartitioning and clustering, " IEEE Trans.Comput.-Aided Des.Integr.Circuits Syst., vol.13, no.9.pp.1088-1096, 1994.) and normalized cut (Normalized-Cut) (J.Shi and J.Malik, " Normalized cutsand image segmentation, " IEEE Trans.Pattern Anal.Mach.Intell., vol.22, no.8, pp.888-905, 2000.) can effectively address this problem.Ratio cuts and the essence of normalized cut is all to wish to reach the balanced division between different bunches.
Theory proves, and it is the normalized difference of similarity matrix that ratio cuts with the key difference of normalized cut.Similarity matrix normalization is actually a process of finding two random (doubly-stochastic) matrixes (non-negative, symmetry and F1=1).Different spectral clusterings is actual can regard a kind of being similar to of finding under different error metrics as.Ratio cuts and is based on L 1normalization under error measure, and normalized cut be based on relative entropy (also referred to as the Kullback-Leibler divergence) normalization under error measure.The people such as Zass (R.Zass and A.Shashua, " Doubly stochastic normalization forspectral clustering; " in Proc.Adv.Neural Inf.Process.Syst., Vancouver, B.C., Canada, 2006, pp.1569 – 1576.) a kind of effective method for normalizing (being called for short the FSC method) of finding doubly stochastic matrix based on the Frobenius norm has been proposed.The FSC method is considered to effective a kind of similarity matrix method for normalizing at present usually.Yet the normalized subject matter of Frobenius is positive semidefinite constraint condition to be left in the basket.This makes the doubly stochastic matrix that the method obtains be inaccurate.On the other hand, adding the Frobenius normalization optimization problem of positive semidefinite condition is a kind of positive semidefinite planning problem.In order to solve this optimization problem, the efficiency of traditional derivation algorithm based on interior point method is very low, and time complexity is up to (O (n 6.5)), wherein n is the number of data, can only be applied on small-scale data set.
Summary of the invention
The object of the present invention is to provide and can utilize easily existing Eigenvalues Decomposition and gradient descent method to solve the positive semidefinite planning problem, can in polynomial time, find globally optimal solution, and its time complexity is only (O (t .n 3), wherein t is iterations, approximately 250 times usually) the positive semidefinite Spectral Clustering based on Lagrange duality.
Technical scheme of the present invention is, at first given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA), then the similarity matrix of the method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between similarity.Then based on Lagrange duality confrontation similarity matrix, carrying out positive semidefinite normalization solves.Normalization matrix is carried out to svd, obtain the matrix that front k eigenvalue of maximum characteristic of correspondence vector forms.Finally utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix and obtain final cluster result.
Positive semidefinite Spectral Clustering based on Lagrange duality of the present invention comprises the following steps:
1) given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA);
2) similarity matrix of the method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between similarity;
3) carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix solves;
4) normalization matrix is carried out to svd, obtain the matrix that front k eigenvalue of maximum characteristic of correspondence vector forms;
5) utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix and obtain final cluster result.
In step 1), the described concrete grammar that given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA) is as follows:
Given sample data collection { (a 1..., a n) | a i∈ R m, i=1 ..., n}, wherein a ithe proper vector that means i sample data; The dimension of each proper vector is that M(M is natural number); The number that n is sample (n is natural number); Sample data integrate the class number that comprises as k(k as natural number).Adopt principal component analysis (PCA) to carry out dimensionality reduction to the sample data collection.Sample data collection after dimensionality reduction is { (b 1..., b n) | b i∈ R p, i=1 ..., n}, wherein b ithe proper vector of i sample data after the expression dimensionality reduction; After dimensionality reduction, the dimension of each proper vector is that P and P are natural number.
In step 2) in, the similarity matrix of the described method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between the concrete grammar of similarity as follows:
Utilize the method for omnidirectional's connection layout to construct the similarity matrix K={K of sample data collection ij} n * n.The weight-sum method of employing based on gaussian kernel function and polynomial kernel function come computational data between similarity, specific formula for calculation is as follows,
K ij = K ( b i , b j ) = αexp - | | b i - b j | | 2 σ 2 + ( 1 - α ) ( b i T b j + 1 ) d
B wherein iand b jthe proper vector that means respectively i sample data and j sample data after dimensionality reduction; K ijmean b iand b jbetween measuring similarity; Parameter δ in gaussian kernel function controls the width between the sample neighborhood; Parameter d in the polynomial kernel function is for controlling the degree of crook of polynomial curve; Parameter alpha ∈ [0,1] is for regulating two kinds of weight proportions between different kernel functions;
The similarity function of described sample data is for calculating the extent of polymerization between each sample data point.
In step 3), described to carry out based on Lagrange duality confrontation similarity matrix the concrete grammar that positive semidefinite normalization solves as follows:
Thereby carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K solves and obtains matrix the optimization problem solved is as follows:
K ^ = arg min F | | K - F | | F 2
s.t.F≥0,F1=1,F=F T,F±0
Wherein
Figure BDA00002375079800034
it is the square formation of n * n; F is the square formation of n * n; F>=0 means that F is nonnegative matrix (in matrix, all elements is non-negative); 1 ∈ R nmean that each element in vector is 1; F ± 0 means that F is positive semidefinite matrix; || .|| fmean the Frobenius norm;
Describedly carry out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K and solve, obtain matrix
Figure BDA00002375079800035
it comprises following sub-step:
(1) initialization n-dimensional vector u is 1, and wherein each element in 1 is 1; Initialization n * n square formation Q is unit matrix I.
(2) calculate n sample number strong point with respect to u i(i=1 ..., gradient output n), computing formula is as follows:
g ( u i ) = - 2 - < P - , T ^ i > , i = 1 , . . . , n
Wherein<.. the inner product between two n * n square formation meaned; P=-(Q+M+K), wherein M=u1 t+ 1u t; P -mean that the eigenwert in P is the negative matrix formed, λ wherein i∈ R, i=1 ..., n and x i∈ R n, (i=1 ..., n) be respectively eigenwert and the proper vector of square formation P.
(3) adopt the algorithm based on Gradient Descent, utilize the gradient that step C2 calculates to be solved following optimization problem, obtain optimum solution u.
min u 1 2 | | P - | | F 2 21 T u
(4) utilize formula Z=P +calculate Z, and utilize formula Q=th >=0(X) calculate Q,
P wherein -mean the matrix of eigenwert for just forming in P, th >=0(.) means the operation of the non-positive element zero clearing in matrix; And X=-(Z+M+K).
(5) judge whether to meet end condition.If do not meet, return to step C2 and carry out next step iteration.If meet, enter C6.
(6) utilize formula K ^ = K + Q * + u * 1 T + 1 u * T + Z * Calculate
Figure BDA00002375079800046
Q wherein *, u *and Z *be respectively the optimal value obtained in above-mentioned iterative process.
In step 4), described normalization matrix is carried out to svd, the concrete grammar that obtains the matrix that front k eigenvalue of maximum characteristic of correspondence vector form is as follows:
To normalization matrix carry out svd, obtain the vectorial matrix W formed of front k eigenvalue of maximum characteristic of correspondence=[u 1..., u k] ∈ R n * k, specific formula for calculation is as follows:
K ^ = &Sigma; i = 1 n &kappa; i u i u i T
κ wherein iand u iit is respectively normalization matrix
Figure BDA00002375079800049
by i maximum eigenwert and the characteristic of correspondence vector obtained after svd;
To normalization matrix
Figure BDA000023750798000410
thereby carry out the effectively low dimension data that svd obtains strengthening dispersion between the interior degree of polymerization of class and class.
In step 5), described utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix to obtain the concrete grammar of final cluster result as follows:
Utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W, obtain final cluster result;
Utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W and obtain final cluster result, it comprises following sub-step:
(1) using the every a line in the W matrix as a proper vector, the proper vector a that original M ties up i, i=1 ..., n is transformed into the proper vector w of k dimension i, i=1 ..., n.
(2) utilize traditional k mean cluster or other clustering methods sample data the collection { (w to low-dimensional 1..., w n) | w i∈ R k, i=1 ..., n} carries out cluster analysis, thereby obtains cluster result.
The present invention proposes a kind of normalized Spectral Clustering of the Frobenius based on positive semidefinite constraint condition rapidly and efficiently, it has utilized Lagrangian Duality to solve the positive semidefinite planning problem.The method is called on the normalization similarity matrix that meets at random two and positive semidefinite constraint condition finding, and its optimization aim is a protruding optimization problem.Based on Lagrange duality matter, the present invention can utilize existing Eigenvalues Decomposition and gradient descent method to solve the positive semidefinite planning problem easily.The method that the present invention proposes can find globally optimal solution in polynomial time, and its time complexity is only (O (t .n 3)), wherein t is iterations (approximately 250 times usually).
The accompanying drawing explanation
Fig. 1 is on the UCI data set, the method that the present invention proposes and cluster result (the minimum cluster error rate) comparison diagram of additive method.
Fig. 2 be the method that proposes of the present invention from the traditional positive semidefinite method for solving based on interior point method in different cluster number of samples situations working time comparison diagram.In Fig. 2, the number that horizontal ordinate is data sample, ordinate is CPU working time.
Embodiment
Below in conjunction with drawings and Examples, method of the present invention is elaborated, the present embodiment is implemented take technical solution of the present invention under prerequisite, provided embodiment and specific operation process, but protection scope of the present invention is not limited to following embodiment.
The embodiment of the embodiment of the present invention comprises the following steps:
S1. given sample data collection { (a 1..., a n) | a i∈ R m, i=1 ..., n}.A ithe proper vector that means i sample data; The dimension of each proper vector is that M(M is natural number); The number that n is sample (n is natural number) and the order of magnitude are 10 3above.Sample data integrate the class number that comprises as k(k as natural number) the general category number is between 1 ~ 100.Adopt principal component analysis (PCA) to carry out dimensionality reduction to data.Sample data collection after dimensionality reduction is { (b 1..., b n) |, b i∈ R p, i=1 ..., n} is b wherein ithe feature that means i sample data after dimensionality reduction to; After dimensionality reduction, the dimension of each proper vector is that P and P are natural number.
S2. the method based on omnidirectional's connection layout is constructed the similarity matrix K={K of sample data collection ij} n * n.Similarity between two sample datas is calculated the weight-sum method adopted based on gaussian kernel function and polynomial kernel function, and specific formula for calculation is as follows:
K ij = K ( b i , b j ) = &alpha;exp - | | b i - b j | | 2 &sigma; 2 + ( 1 - &alpha; ) ( b i T b j + 1 ) d
B wherein iand b jmean respectively the proper vector of i sample data and j sample data; K ijmean b iand b jbetween measuring similarity; Parameter δ in gaussian kernel function controls the width between the sample neighborhood, generally selects between 1 ~ 1000; Parameter d in the polynomial kernel function is controlled the degree of crook of polynomial curve; Parameter alpha ∈ [0,1] regulates the weight proportion of two kinds of different kernel functions, generally selects 0.3 ~ 0.5 left and right.
S3. carry out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K, thereby obtain matrix
Figure BDA00002375079800062
the optimization problem solved is defined as follows:
K ^ = arg min F | | K - F | | F 2
s.t.F≥0,F1=1,F=F T,F±0
Wherein
Figure BDA00002375079800064
it is the square formation of n * n; F is the square formation of n * n; F>=0 means that F is nonnegative matrix (in matrix, all elements is non-negative); 1 ∈ R nmean that each element in vector is 1; F ± 0 means that F is positive semidefinite matrix; || .|| fmean the Frobenius norm.
Specifically comprise: initialization n-dimensional vector u is 1, and wherein each element in 1 is 1; Initialization n * n square formation Q is unit matrix I.
Calculate n sample number strong point with respect to u i(i=1 ..., gradient output n), the following institute of computing formula:
g ( u i ) = - 2 - < P - , T ^ i > , i = 1 , . . . , n
Wherein<.. the inner product between two n * n square formation meaned; P=-(Q+M+K), wherein M=u1 t+ 1u t; P -mean that the eigenwert in P is the negative matrix formed,
Figure BDA00002375079800066
λ wherein i∈ R, i=1 ..., n and x i∈ R n, (i=1 ..., n) be respectively eigenwert and the proper vector of square formation P.
The algorithm of employing based on Gradient Descent, such as the L-BFGS-B algorithm, the gradient based on calculating in above-mentioned steps is solved following optimization problem, obtains optimum solution u.
min u 1 2 | | P - | | F 2 21 T u
Utilize formula Z=P +calculate Z, and utilize formula Q=th >=0(X) calculate Q,
P wherein _mean the matrix of eigenwert for just forming in P,
Figure BDA00002375079800072
th >=0(.) means the operation of the non-positive element zero clearing in matrix; And X=-(Z+M+K).
Judge whether to meet end condition, whether the output valve of objective function changes that (relative error of the output valve that the output valve of the objective function obtained such as last iteration and this iteration obtain is 10 -3) or iterations whether reach restriction (such as 300 iteration).If do not meet, return to the gradient output of calculating n sample number strong point in above-mentioned steps.If meet, enter next step.
Utilize formula K ^ = K + Q * + u * 1 T + 1 u * T + Z * Calculate
Figure BDA00002375079800074
Q wherein *, u *and Z *be respectively the optimal value obtained in above-mentioned iterative process.
S4. to normalization matrix
Figure BDA00002375079800075
carry out svd, obtain the vectorial matrix W formed of front k eigenvalue of maximum characteristic of correspondence=[u 1..., u k] ∈ R n * k, specific formula for calculation is as follows:
K ^ = &Sigma; i = 1 n &kappa; i u i u i T
K wherein iand u iit is respectively normalization matrix by i maximum eigenwert and the characteristic of correspondence vector obtained after svd.
S5. utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W and obtain final cluster result.
Specifically comprise: using the every a line in the W matrix as a proper vector, the proper vector a that original M ties up i, i=1 ..., n is transformed into the proper vector w of k dimension i, i=1 ..., n.
Utilize traditional k mean cluster or other clustering methods sample data the collection { (w to low-dimensional 1..., w n) | w i∈ R k, i=1 ..., n} carries out cluster analysis and obtains cluster result.
Fig. 1 is given on the UCI data set, the method that the present invention proposes and cluster result (the minimum cluster error rate) comparison diagram of additive method.In Fig. 1, a is Ratio-Cut, and b is Normalized-Cut, and c is FSC, and d is the method that the present invention proposes.
Fig. 2 provide method that the present invention proposes and the traditional positive semidefinite method for solving based on interior point method in different cluster number of samples situations working time comparison diagram.In Fig. 2, the method that the corresponding the present invention of curve 1 proposes; The positive semidefinite method for solving based on interior point method that curve 2 is corresponding traditional.

Claims (9)

1. the positive semidefinite Spectral Clustering based on Lagrange duality is characterized in that comprising the following steps:
1) given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA);
2) similarity matrix of the method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between similarity;
3) carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix solves;
4) normalization matrix is carried out to svd, obtain the matrix that front k eigenvalue of maximum characteristic of correspondence vector forms;
5) utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix and obtain final cluster result.
2. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, is characterized in that in step 1), and the described concrete grammar that given sample data centralized procurement is carried out to dimensionality reduction with principal component analysis (PCA) is as follows:
Given sample data collection { (a 1..., a n) | a i∈ R m, i=1 ..., n}, wherein a ithe proper vector that means i sample data; The dimension of each proper vector is M, and M is natural number; The number that n is sample, n is natural number; Sample data integrates the class number that comprises as k, and k is natural number; Adopt principal component analysis (PCA) to carry out dimensionality reduction to the sample data collection, the sample data collection after dimensionality reduction is { (b 1..., b n) | b i∈ R p, i=1 ..., n}, wherein b ithe proper vector of i sample data after the expression dimensionality reduction; After dimensionality reduction, the dimension of each proper vector is that P and P are natural number.
3. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, it is characterized in that in step 2) in, the similarity matrix of the described method construct sample data collection based on omnidirectional's connection layout, and adopt weight-sum method based on gaussian kernel function and polynomial kernel function calculate sample data between the concrete grammar of similarity as follows:
Utilize the method for omnidirectional's connection layout to construct the similarity matrix K={K of sample data collection ij} n * n, adopt weight-sum method based on gaussian kernel function and polynomial kernel function come computational data between similarity, specific formula for calculation is as follows,
K ij = K ( b i , b j ) = &alpha;exp - | | b i - b j | | 2 &sigma; 2 + ( 1 - &alpha; ) ( b i T b j + 1 ) d
B wherein iand b jthe proper vector that means respectively i sample data and j sample data after dimensionality reduction; K ijmean b iand b jbetween measuring similarity; Parameter δ in gaussian kernel function controls the width between the sample neighborhood; Parameter d in the polynomial kernel function is for controlling the degree of crook of polynomial curve; Parameter alpha ∈ [0,1] is for regulating two kinds of weight proportions between different kernel functions.
4. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 3, is characterized in that the similarity function of described sample data is for calculating the extent of polymerization between each sample data point.
5. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, is characterized in that in step 3), and described to carry out based on Lagrange duality confrontation similarity matrix the concrete grammar that positive semidefinite normalization solves as follows:
Thereby carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K solves and obtains matrix
Figure FDA00002375079700021
the optimization problem solved is as follows:
K ^ = arg min F | | K - F | | F 2
s.t.F≥0,F1=1,F=F T,F±0
Wherein
Figure FDA00002375079700023
it is the square formation of n * n; F is the square formation of n * n; F>=0 means that F is nonnegative matrix, and in matrix, all elements is non-negative; 1 ∈ R nmean that each element in vector is 1; F ± 0 means that F is positive semidefinite matrix; || .|| fmean the Frobenius norm.
6. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 5, is characterized in that describedly carrying out positive semidefinite normalization based on Lagrange duality confrontation similarity matrix K and solving, and obtains matrix
Figure FDA00002375079700024
it comprises following sub-step:
(1) initialization n-dimensional vector u is 1, and wherein each element in 1 is 1; Initialization n * n square formation Q is unit matrix I;
(2) calculate n sample number strong point with respect to u i(i=1 ..., gradient output n), computing formula is as follows:
g ( u i ) = - 2 - < P - , T ^ i > , i = 1 , . . . , n
Wherein<.. the inner product between two n * n square formation meaned; P=-(Q+M+K), wherein M=u1 t+ 1u t; P _mean that the eigenwert in P is the negative matrix formed,
Figure FDA00002375079700026
λ wherein i∈ R, i=1 ..., n and x i∈ R n, (i=1 ..., n) be respectively eigenwert and the proper vector of square formation P;
(3) adopt the algorithm based on Gradient Descent, utilize the gradient that step C2 calculates to be solved following optimization problem, obtain optimum solution u;
min u 1 2 | | P - | | F 2 21 T u
(4) utilize formula Z=P +calculate Z, and utilize formula Q=th >=0(X) calculate Q, wherein P _mean the matrix of eigenwert for just forming in P,
Figure FDA00002375079700028
th >=0(.) means the operation of the non-positive element zero clearing in matrix; And X=-(Z+M+K);
(5) judge whether to meet end condition.If do not meet, return to step C2 and carry out next step iteration.If meet, enter C6;
(6) utilize formula K ^ = K + Q * + u * 1 T + 1 u * T + Z * Calculate
Figure FDA000023750797000210
Q wherein *, u *and Z *be respectively the optimal value obtained in above-mentioned iterative process.
7. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, it is characterized in that in step 4), described normalization matrix is carried out to svd, the concrete grammar that obtains the matrix that front k eigenvalue of maximum characteristic of correspondence vector form is as follows:
To normalization matrix
Figure FDA00002375079700031
carry out svd, obtain the vectorial matrix W formed of front k eigenvalue of maximum characteristic of correspondence=[u 1..., u k] ∈ R n * k, specific formula for calculation is as follows:
K ^ = &Sigma; i = 1 n &kappa; i u i u i T
κ wherein iand u iit is respectively normalization matrix
Figure FDA00002375079700033
by i maximum eigenwert and the characteristic of correspondence vector obtained after svd;
To normalization matrix thereby carry out the effectively low dimension data that svd obtains strengthening dispersion between the interior degree of polymerization of class and class.
8. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 1, it is characterized in that in step 5), described utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to eigenvectors matrix to obtain the concrete grammar of final cluster result as follows:
Utilize traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W, obtain final cluster result.
9. the positive semidefinite Spectral Clustering based on Lagrange duality as claimed in claim 8, is characterized in that utilizing traditional k mean cluster or other clustering methods to carry out cluster analysis to matrix W and obtain final cluster result, and it comprises following sub-step:
(1) using the every a line in the W matrix as a proper vector, the proper vector a that original M ties up i, i=1 ..., n is transformed into the proper vector w of k dimension i, i=1 ..., n;
(2) utilize traditional k mean cluster or other clustering methods sample data the collection { (w to low-dimensional 1..., w n) | w i∈ R k, i=1 ..., n} carries out cluster analysis, thereby obtains cluster result.
CN201210445602.9A 2012-11-08 2012-11-08 Positive semidefinite spectral clustering method based on Lagrange dual Expired - Fee Related CN102982342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210445602.9A CN102982342B (en) 2012-11-08 2012-11-08 Positive semidefinite spectral clustering method based on Lagrange dual

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210445602.9A CN102982342B (en) 2012-11-08 2012-11-08 Positive semidefinite spectral clustering method based on Lagrange dual

Publications (2)

Publication Number Publication Date
CN102982342A true CN102982342A (en) 2013-03-20
CN102982342B CN102982342B (en) 2015-07-15

Family

ID=47856324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210445602.9A Expired - Fee Related CN102982342B (en) 2012-11-08 2012-11-08 Positive semidefinite spectral clustering method based on Lagrange dual

Country Status (1)

Country Link
CN (1) CN102982342B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117712A (en) * 2018-06-26 2019-01-01 中国地质大学(武汉) A kind of maximal margin estimates the Hyperspectral imaging object detection method and system of study
CN109346104A (en) * 2018-08-29 2019-02-15 昆明理工大学 A kind of audio frequency characteristics dimension reduction method based on spectral clustering
WO2020199745A1 (en) * 2019-03-29 2020-10-08 创新先进技术有限公司 Sample clustering method and device
CN111767941A (en) * 2020-05-15 2020-10-13 上海大学 Improved spectral clustering and parallelization method based on symmetric nonnegative matrix factorization
CN113269203A (en) * 2021-05-17 2021-08-17 电子科技大学 Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080010245A1 (en) * 2006-07-10 2008-01-10 Jaehwan Kim Method for clustering data based convex optimization
CN101976348A (en) * 2010-10-21 2011-02-16 中国科学院深圳先进技术研究院 Image clustering method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080010245A1 (en) * 2006-07-10 2008-01-10 Jaehwan Kim Method for clustering data based convex optimization
CN101976348A (en) * 2010-10-21 2011-02-16 中国科学院深圳先进技术研究院 Image clustering method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RON ZASS等: "Doubly Stochastic Normalization for Spectral Clustering", 《PROCEEDINGS OF THE 2006 CONFERENCE ON ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS》, 31 December 2007 (2007-12-31) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117712A (en) * 2018-06-26 2019-01-01 中国地质大学(武汉) A kind of maximal margin estimates the Hyperspectral imaging object detection method and system of study
CN109346104A (en) * 2018-08-29 2019-02-15 昆明理工大学 A kind of audio frequency characteristics dimension reduction method based on spectral clustering
WO2020199745A1 (en) * 2019-03-29 2020-10-08 创新先进技术有限公司 Sample clustering method and device
CN111767941A (en) * 2020-05-15 2020-10-13 上海大学 Improved spectral clustering and parallelization method based on symmetric nonnegative matrix factorization
CN111767941B (en) * 2020-05-15 2022-11-18 上海大学 Improved spectral clustering and parallelization method based on symmetric nonnegative matrix factorization
CN113269203A (en) * 2021-05-17 2021-08-17 电子科技大学 Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition
CN113269203B (en) * 2021-05-17 2022-03-25 电子科技大学 Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition

Also Published As

Publication number Publication date
CN102982342B (en) 2015-07-15

Similar Documents

Publication Publication Date Title
Wang et al. A comparative study of encoding, pooling and normalization methods for action recognition
Benabdeslem et al. Efficient semi-supervised feature selection: constraint, relevance, and redundancy
Galluccio et al. Graph based k-means clustering
CN102982342B (en) Positive semidefinite spectral clustering method based on Lagrange dual
CN102855492B (en) Classification method based on mineral flotation foam image
CN103678500A (en) Data mining improved type K mean value clustering method based on linear discriminant analysis
CN103218617B (en) A kind of feature extracting method of polyteny Large space
CN104462196A (en) Multi-feature-combined Hash information retrieval method
Cho et al. Authority-shift clustering: Hierarchical clustering by authority seeking on graphs
CN104331880A (en) Hyper-spectral mixed pixel decomposition method based on geometric spatial spectral structure information
Liu et al. Short-term local prediction of wind speed and wind power based on singular spectrum analysis and locality-sensitive hashing
Kuang et al. A novel approach of KPCA and SVM for intrusion detection
CN105574642A (en) Smart grid big data-based electricity price execution checking method
CN103605985A (en) A data dimension reduction method based on a tensor global-local preserving projection
CN103365999A (en) Text clustering integrated method based on similarity degree matrix spectral factorization
CN104680179A (en) Data dimension reduction method based on neighborhood similarity
Lin et al. Visual feature coding based on heterogeneous structure fusion for image classification
CN105550641A (en) Age estimation method and system based on multi-scale linear differential textural features
Wang et al. Comparison of dimensionality reduction techniques for multi-variable spatiotemporal flow fields
Gao et al. The dynamical neighborhood selection based on the sampling density and manifold curvature for isometric data embedding
Moon et al. Image patch analysis of sunspots and active regions-II. Clustering via matrix factorization
Zhang et al. Gaussian mixture reduction based on fuzzy ART for extended target tracking
Wu et al. An MCMC based EM algorithm for mixtures of Gaussian processes
CN102799891A (en) Spectral clustering method based on landmark point representation
Fan Dynamic nonlinear matrix completion for time-varying data imputation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150715

Termination date: 20211108

CF01 Termination of patent right due to non-payment of annual fee