CN111930934B - Clustering method based on constraint sparse concept decomposition of dual local agreement - Google Patents
Clustering method based on constraint sparse concept decomposition of dual local agreement Download PDFInfo
- Publication number
- CN111930934B CN111930934B CN202010507876.0A CN202010507876A CN111930934B CN 111930934 B CN111930934 B CN 111930934B CN 202010507876 A CN202010507876 A CN 202010507876A CN 111930934 B CN111930934 B CN 111930934B
- Authority
- CN
- China
- Prior art keywords
- matrix
- clustered
- clustering
- decomposition
- auxiliary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 17
- 230000009977 dual effect Effects 0.000 title claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims abstract description 82
- 238000007621 cluster analysis Methods 0.000 claims abstract description 6
- 238000003064 k means clustering Methods 0.000 claims abstract description 5
- 230000000694 effects Effects 0.000 abstract description 2
- 230000006978 adaptation Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a clustering method based on constraint sparse concept decomposition consistent with dual parts, which is characterized by comprising the following steps: s10, acquiring samples to be clustered to form a sample data set to be clustered; s20, constructing an adjacency matrix aiming at the sample data set to be clustered; s30, establishing an objective function J based on conceptual decomposition DESCFS The method comprises the steps of carrying out a first treatment on the surface of the S40, iterating for preset times by using an iteration weighting method according to the objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z; and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ. Compared with the traditional clustering method, the method has the advantages that the internal geometric structure and the distinguishing structure of the data are more effectively disclosed, and the clustering effect is improved.
Description
Technical Field
The invention relates to the technical field of text clustering, in particular to a clustering method based on constraint sparse concept decomposition of dual local consistency.
Background
Matrix decomposition-based methods have gained widespread attention in document clustering over the past decade. When using matrix factorization based methods, a text document is typically a point in a high-dimensional linear space, one term for each dimension. At the heart of all targets of the cluster analysis is the concept of similarity between the individual objects being clustered. Research shows that the similarity can be measured more accurately in a low-dimensional space, so that the clustering performance is improved. The application of NMF (negative matrix factorization) in document clustering has achieved impressive results. In NMF: given a non-negative data matrix X, low-rank non-negative matrices U and V are found so that UV provides a good approximation to X. How to perform NMF efficiently in the transformed data space is a big problem.
In order to solve NMF limitations while inheriting all its advantages, xu and Gong propose CF (concept decomposition) for data clustering. The CF models each cluster as a linear combination of data points, and models each data point as a linear combination of cluster centers; and then, the data clustering is completed by calculating two groups of linear coefficients, and the data clustering is realized by finding a non-negative solution of the minimum data point reconstruction error. The main advantage of CF compared to NMF is that it can be performed on any data item, whether in raw space or RKHS.
Many CF-based classification methods have been developed by expanding classical CFs in various ways (e.g., imposing additional constraints and incorporating regulatory information), essentially to learn low-dimensional discriminant, and input into a typical classifier in turn. However, these approaches ignore the dependency between the two processes, and the resulting low-dimensional features may not adapt well to the classifier used, resulting in suboptimal classification performance.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a clustering method based on dual local consistent constraint sparse concept decomposition, which effectively solves the technical problem of poor clustering effect of the existing clustering method.
In order to achieve the above purpose, the present invention is realized by the following technical scheme:
a clustering method based on dual local coincidence constraint sparse concept decomposition comprises the following steps:
s10, acquiring samples to be clustered to form a sample data set to be clustered;
s20, constructing an adjacency matrix aiming at the sample data set to be clustered;
s30, establishing an objective function J based on conceptual decomposition DESCFS :
Wherein x= [ X ] 1 ,x 2 ,...,x n ]For a sample data set to be clusteredThe constructed adjacency matrix, W is the base matrix, A= [ A ] l ;A u ]=[a 1 ,a 2 ,...,a n ] T ∈R n×c Is a label matrix, A l ∈R w×c Representing the class of marked samples, A u ∈R (n -w)×c Representing the class of unlabeled exemplars, Z ε R c×d Is an auxiliary matrix, a i ∈R c×1 The position of the largest element in (2) represents sample x i The category to which R represents a real number set; alpha is a characteristic space local consistency regularization parameter, beta is a class space local consistency regularization parameter, lambda is a sparse parameter, and Tr (·) is a trace of a matrix; l is the laplacian matrix of the weight graph and l=d-S, D is a diagonal matrix, which is the side weight matrix S ij Sum of rows or columns of (D) ii =∑ j S ij ;
S40, iterating for preset times by using an iteration weighting method according to the objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z;
and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ.
Compared with the traditional clustering method, the clustering method based on the constraint sparse concept decomposition of dual local consistency combines the local consistency of the feature space and the prior class information to reveal the inherent geometry of the data, so that the algorithm has certain discrimination, and on the basis, the local consistency of the class space and the sparse constraint are combined to more effectively reveal the inherent geometry and the discrimination structure of the data, and the discrimination capability of the algorithm is enhanced by keeping the class information of the samples and the inherent geometry manifold structure information among the samples, so that the clustering performance is greatly improved.
Drawings
The invention will be more fully understood and its attendant advantages and features will be more readily understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic flow diagram of a clustering method based on constrained sparse concept decomposition with dual local agreement in the invention.
Detailed Description
In order to make the contents of the present invention more clear and understandable, the contents of the present invention will be further described with reference to the accompanying drawings. Of course, the invention is not limited to this particular embodiment, and common alternatives known to those skilled in the art are also encompassed within the scope of the invention.
In the CF model, each basis vector v j Is the data point x i Non-negative linear combinations of (v) j =∑ i w ij x i Wherein the coefficient w ij And is more than or equal to 0. Let the base matrix w= [ W ] ij ]∈R n×k The goal of the CF algorithm is to find two matrices W and V such that X≡ XWV T . Thus, the objective function J of the CF algorithm CF Can be represented by formula (1):
wherein the data set x= [ X ] 1 ,x 2 ,...,x n ]Coefficient matrix v= [ V ] 1 ,v 2 ,...,v n ]。
According to the principle of local consistency, if x i And x j Is two similar samples, then the corresponding low-dimensional space represents z i And z j And are also similar. Assuming that the data set includes n samples in total, if the ith sample is a neighbor point of the jth sample, an edge exists between the ith sample and the jth sample, and the weight is s ij . Defining an edge weight matrix S ij Is of formula (2):
wherein N is p (x j ) Representing sample x j Is a set of p nearest neighbor data samples. The low-dimensional representation smoothness on the p-nearest neighbor map is measured by the formula. Measuring light of data points in a low-dimensional space with O is typically usedSlip degree, O may represent formula (3):
wherein Tr (·) represents the trace of the matrix, D represents a diagonal matrix whose terms are the side weight matrix S ij Sum of rows or columns (S is a symmetric matrix), i.e. D ii =∑ j S ij L is the laplacian matrix of the weight map and l=d-S.
The low-dimensional representation learned by the concept decomposition method is further projected into a label space, which defines the formula (4):
V=AZ (4)
wherein A= [ A ] l ;A u ]=[a 1 ,a 2 ,...,a n ] T ∈R n×c Is a label matrix, A l ∈R w×c Representing the class of marked samples, A u ∈R (n-w)×c Representing the class of unlabeled exemplars, Z ε R c×d Is an auxiliary matrix, a i ∈R c×1 The position of the largest element in (2) represents sample x i Belonging to the category. Let it be assumed that two data samples x i And x j Belonging to the k-th class, they have the same low rank representation, i.e. v, as known from equation (4) i =v j And a i And a j 1 in the k-th item, and the remaining items are 0.
The label matrix a and the coefficient matrix V are representations of the samples in a label space and a low-dimensional feature space, respectively. To preserve the inherent geometry of the data samples, the two representations may be further regularized with LE constraints, optimally reconstructing each sample in different spaces from the same linear combination of its local neighbors. Objective function J at this time DESCFS As formula (5):
where α represents a feature space local uniformity regularization parameter, β represents a class space local uniformity regularization parameter, and Tr (·) represents the trace of the matrix.
Based on the above, as shown in fig. 1, the invention provides a clustering method based on constrained sparse concept decomposition of dual local coincidence, which is characterized by comprising the following steps:
s10, acquiring samples to be clustered to form a sample data set to be clustered;
s20, constructing an adjacency matrix aiming at a sample data set to be clustered;
s30, establishing an objective function J based on conceptual decomposition DESCFS As formula (6):
wherein x= [ X ] 1 ,x 2 ,...,x n ]For an adjacency matrix constructed from sample data sets to be clustered, W is the basis matrix, a= [ a ] 1 ,a 2 ,...,a n ] T ∈R n×c For the label matrix, Z ε R c×d As an auxiliary matrix, a i ∈R c×1 Represents x i The representation in label space is x i The category to which R represents a real number set; alpha is a characteristic space local consistency regularization parameter, beta is a class space local consistency regularization parameter, lambda is a sparse parameter, and Tr (·) is a trace of a matrix; l is the laplace matrix of weights and l=d-S, D is a diagonal matrix, which is the side weight matrix S ij Sum of rows or columns of (D) ii =∑ j S ij ;
S40, iterating for preset times by using an iteration weighting method according to an objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z;
and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ.
Due to the objective function J DESCFS It is known that the desfs algorithm is non-convex for the entire base matrix W, tag matrix a, and auxiliary matrix Z, and therefore cannot solve for the global optimum. However, if the base matrix W, the tag matrix A and the auxiliary moment are madeTwo variables in the array Z are fixed, and the other variable is changed, so that the objective function is convex, and an iterative method is adopted to solve the local optimal solution of the objective function. With this objective function J DESCFS Can be reduced to a function Ω, as in equation (7):
wherein k=x T X。
Due to w ij ≥0,z ij More than or equal to 0, let ψ= [ ψ ] ij ],Φ=[φ ij ],γ=[γ ij ]The lagrangian function La may represent equation (8):
La=Ω+Tr(ΨW T )+Tr(ΦZ T )+Tr(γA T ) (8)
wherein ψ is the Lagrangian multiplier of W, Φ is the Lagrangian multiplier of Z, and γ is the Lagrangian multiplier of A.
The base matrix W, the tag matrix a and the auxiliary matrix Z are derived respectively to obtain formulas (9) to (11):
let k=x be an intermediate variable T X, intermediate variableAnd obtaining iterative updating rules of the base matrix W, the label matrix A and the auxiliary matrix Z according to KKT conditions, wherein,
the update rule of the base matrix W is as shown in formula (12):
the updating rule of the tag matrix A is as shown in the formula (13):
the update rule of the auxiliary matrix Z is as shown in formula (14):
and finally, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm to realize the clustering of the samples to be clustered.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that, for the above description of the preferred embodiment of the present invention, it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principle of the present invention, and these modifications and adaptations are intended to be comprehended within the scope of the present invention.
Claims (2)
1. A clustering method based on dual local coincidence constraint sparse concept decomposition is used for text clustering, and is characterized by comprising the following steps:
s10, acquiring samples to be clustered to form a sample data set to be clustered;
s20, constructing an adjacency matrix aiming at the sample data set to be clustered;
s30, establishing an objective function J based on conceptual decomposition DESCFS ;
Wherein x= [ X ] 1 ,x 2 ,...,x n ]For an adjacency matrix constructed from sample data sets to be clustered, W isBase matrix, wherein a= [ a ] l ;A u ]=[a 1 ,a 2 ,...,a n ] T ∈R n×c Is a label matrix, A l ∈R w×c Representing the class of marked samples, A u ∈R (n-w)×c Representing the class of unlabeled exemplars, Z ε R c×d Is an auxiliary matrix, a i ∈R c×1 The position of the largest element in (2) represents sample x i The category to which R represents a real number set; alpha is a characteristic space local consistency regularization parameter, beta is a class space local consistency regularization parameter, lambda is a sparse parameter, and Tr (·) is a trace of a matrix; l is the laplacian matrix of the weight graph and l=d-S, D is a diagonal matrix, which is the side weight matrix S ij Sum of rows or columns of (D) ii =∑ j S ij ;
S40, iterating for preset times by using an iteration weighting method according to the objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z;
and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ.
2. The clustering method as claimed in claim 1, wherein in step S40, update rules of the base matrix W, the tag matrix a and the auxiliary matrix Z are respectively:
the update rule of the base matrix W is:
the update rule of the tag matrix a is:
the update rule of the auxiliary matrix Z is as follows:
wherein k=x T X,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010507876.0A CN111930934B (en) | 2020-06-05 | 2020-06-05 | Clustering method based on constraint sparse concept decomposition of dual local agreement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010507876.0A CN111930934B (en) | 2020-06-05 | 2020-06-05 | Clustering method based on constraint sparse concept decomposition of dual local agreement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111930934A CN111930934A (en) | 2020-11-13 |
CN111930934B true CN111930934B (en) | 2023-12-26 |
Family
ID=73316516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010507876.0A Active CN111930934B (en) | 2020-06-05 | 2020-06-05 | Clustering method based on constraint sparse concept decomposition of dual local agreement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111930934B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184298A (en) * | 2015-08-27 | 2015-12-23 | 重庆大学 | Image classification method through fast and locality-constrained low-rank coding process |
CN107301643A (en) * | 2017-06-06 | 2017-10-27 | 西安电子科技大学 | Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms |
CN109508737A (en) * | 2018-10-31 | 2019-03-22 | 江苏理工学院 | Constrained concept based on matrix of depths decomposes clustering method |
CN109614581A (en) * | 2018-10-19 | 2019-04-12 | 江苏理工学院 | The Non-negative Matrix Factorization clustering method locally learnt based on antithesis |
CN110096596A (en) * | 2019-05-08 | 2019-08-06 | 广东工业大学 | A kind of multiple view Text Clustering Method, device and equipment based on concept separating |
CN110866560A (en) * | 2019-11-15 | 2020-03-06 | 重庆邮电大学 | Symmetric low-rank representation subspace clustering method based on structural constraint |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3295386A4 (en) * | 2015-05-11 | 2019-01-16 | Olsher, Daniel Joseph | Universal task independent simulation and control platform for generating controlled actions using nuanced artificial intelligence |
-
2020
- 2020-06-05 CN CN202010507876.0A patent/CN111930934B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184298A (en) * | 2015-08-27 | 2015-12-23 | 重庆大学 | Image classification method through fast and locality-constrained low-rank coding process |
CN107301643A (en) * | 2017-06-06 | 2017-10-27 | 西安电子科技大学 | Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms |
CN109614581A (en) * | 2018-10-19 | 2019-04-12 | 江苏理工学院 | The Non-negative Matrix Factorization clustering method locally learnt based on antithesis |
CN109508737A (en) * | 2018-10-31 | 2019-03-22 | 江苏理工学院 | Constrained concept based on matrix of depths decomposes clustering method |
CN110096596A (en) * | 2019-05-08 | 2019-08-06 | 广东工业大学 | A kind of multiple view Text Clustering Method, device and equipment based on concept separating |
CN110866560A (en) * | 2019-11-15 | 2020-03-06 | 重庆邮电大学 | Symmetric low-rank representation subspace clustering method based on structural constraint |
Non-Patent Citations (3)
Title |
---|
Low-rank doubly stochastic matrix decomposition for cluster analysis;Zhirong Yang 等;《JMLR》;6454–6478 * |
核稀疏概念编码算法及在图像表示中的应用;舒振球 等;《系统工程理论与实践》;1331-1339 * |
结合新概念分解和频繁词集的短文本聚类;贾瑞玉 等;《小型微型计算机系统》;1321-1326 * |
Also Published As
Publication number | Publication date |
---|---|
CN111930934A (en) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Song et al. | Auto-encoder based data clustering | |
Hammer et al. | Learning vector quantization for (dis-) similarities | |
He | Laplacian regularized d-optimal design for active learning and its application to image retrieval | |
JP5565190B2 (en) | Learning model creation program, image identification information addition program, learning model creation device, and image identification information addition device | |
Zhang et al. | A survey on concept factorization: From shallow to deep representation learning | |
CN110942091B (en) | Semi-supervised few-sample image classification method for searching reliable abnormal data center | |
CN107451545B (en) | The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label | |
Chakraborty et al. | Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm | |
CN103985112B (en) | Image segmentation method based on improved multi-objective particle swarm optimization and clustering | |
CN109657611B (en) | Adaptive image regularization non-negative matrix decomposition method for face recognition | |
CN104392231A (en) | Block and sparse principal feature extraction-based rapid collaborative saliency detection method | |
Zhang et al. | Autoencoder-based unsupervised clustering and hashing | |
Peng et al. | Hyperplane-based nonnegative matrix factorization with label information | |
CN114299362A (en) | Small sample image classification method based on k-means clustering | |
CN114399653A (en) | Fast multi-view discrete clustering method and system based on anchor point diagram | |
CN110276049A (en) | A kind of semi-supervised adaptive figure regularized discriminant non-negative matrix factorization method | |
Chen et al. | Stability-based preference selection in affinity propagation | |
CN111930934B (en) | Clustering method based on constraint sparse concept decomposition of dual local agreement | |
Yang et al. | Robust landmark graph-based clustering for high-dimensional data | |
CN109284375A (en) | A kind of domain self-adaptive reduced-dimensions method retained based on primary data information (pdi) | |
Ng et al. | Incremental hashing with sample selection using dominant sets | |
Jena et al. | Elitist TLBO for identification and verification of plant diseases | |
CN115731137A (en) | Outdoor large scene point cloud segmentation method based on A-EdgeConv | |
CN109614581A (en) | The Non-negative Matrix Factorization clustering method locally learnt based on antithesis | |
Dong et al. | Discriminative analysis dictionary learning with adaptively ordinal locality preserving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |