CN111930934B - Clustering method based on constraint sparse concept decomposition of dual local agreement - Google Patents

Clustering method based on constraint sparse concept decomposition of dual local agreement Download PDF

Info

Publication number
CN111930934B
CN111930934B CN202010507876.0A CN202010507876A CN111930934B CN 111930934 B CN111930934 B CN 111930934B CN 202010507876 A CN202010507876 A CN 202010507876A CN 111930934 B CN111930934 B CN 111930934B
Authority
CN
China
Prior art keywords
matrix
clustered
clustering
decomposition
auxiliary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010507876.0A
Other languages
Chinese (zh)
Other versions
CN111930934A (en
Inventor
舒振球
张云猛
翁宗慧
叶飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Technology
Original Assignee
Jiangsu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Technology filed Critical Jiangsu University of Technology
Priority to CN202010507876.0A priority Critical patent/CN111930934B/en
Publication of CN111930934A publication Critical patent/CN111930934A/en
Application granted granted Critical
Publication of CN111930934B publication Critical patent/CN111930934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention discloses a clustering method based on constraint sparse concept decomposition consistent with dual parts, which is characterized by comprising the following steps: s10, acquiring samples to be clustered to form a sample data set to be clustered; s20, constructing an adjacency matrix aiming at the sample data set to be clustered; s30, establishing an objective function J based on conceptual decomposition DESCFS The method comprises the steps of carrying out a first treatment on the surface of the S40, iterating for preset times by using an iteration weighting method according to the objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z; and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ. Compared with the traditional clustering method, the method has the advantages that the internal geometric structure and the distinguishing structure of the data are more effectively disclosed, and the clustering effect is improved.

Description

Clustering method based on constraint sparse concept decomposition of dual local agreement
Technical Field
The invention relates to the technical field of text clustering, in particular to a clustering method based on constraint sparse concept decomposition of dual local consistency.
Background
Matrix decomposition-based methods have gained widespread attention in document clustering over the past decade. When using matrix factorization based methods, a text document is typically a point in a high-dimensional linear space, one term for each dimension. At the heart of all targets of the cluster analysis is the concept of similarity between the individual objects being clustered. Research shows that the similarity can be measured more accurately in a low-dimensional space, so that the clustering performance is improved. The application of NMF (negative matrix factorization) in document clustering has achieved impressive results. In NMF: given a non-negative data matrix X, low-rank non-negative matrices U and V are found so that UV provides a good approximation to X. How to perform NMF efficiently in the transformed data space is a big problem.
In order to solve NMF limitations while inheriting all its advantages, xu and Gong propose CF (concept decomposition) for data clustering. The CF models each cluster as a linear combination of data points, and models each data point as a linear combination of cluster centers; and then, the data clustering is completed by calculating two groups of linear coefficients, and the data clustering is realized by finding a non-negative solution of the minimum data point reconstruction error. The main advantage of CF compared to NMF is that it can be performed on any data item, whether in raw space or RKHS.
Many CF-based classification methods have been developed by expanding classical CFs in various ways (e.g., imposing additional constraints and incorporating regulatory information), essentially to learn low-dimensional discriminant, and input into a typical classifier in turn. However, these approaches ignore the dependency between the two processes, and the resulting low-dimensional features may not adapt well to the classifier used, resulting in suboptimal classification performance.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a clustering method based on dual local consistent constraint sparse concept decomposition, which effectively solves the technical problem of poor clustering effect of the existing clustering method.
In order to achieve the above purpose, the present invention is realized by the following technical scheme:
a clustering method based on dual local coincidence constraint sparse concept decomposition comprises the following steps:
s10, acquiring samples to be clustered to form a sample data set to be clustered;
s20, constructing an adjacency matrix aiming at the sample data set to be clustered;
s30, establishing an objective function J based on conceptual decomposition DESCFS
Wherein x= [ X ] 1 ,x 2 ,...,x n ]For a sample data set to be clusteredThe constructed adjacency matrix, W is the base matrix, A= [ A ] l ;A u ]=[a 1 ,a 2 ,...,a n ] T ∈R n×c Is a label matrix, A l ∈R w×c Representing the class of marked samples, A u ∈R (n -w)×c Representing the class of unlabeled exemplars, Z ε R c×d Is an auxiliary matrix, a i ∈R c×1 The position of the largest element in (2) represents sample x i The category to which R represents a real number set; alpha is a characteristic space local consistency regularization parameter, beta is a class space local consistency regularization parameter, lambda is a sparse parameter, and Tr (·) is a trace of a matrix; l is the laplacian matrix of the weight graph and l=d-S, D is a diagonal matrix, which is the side weight matrix S ij Sum of rows or columns of (D) ii =∑ j S ij
S40, iterating for preset times by using an iteration weighting method according to the objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z;
and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ.
Compared with the traditional clustering method, the clustering method based on the constraint sparse concept decomposition of dual local consistency combines the local consistency of the feature space and the prior class information to reveal the inherent geometry of the data, so that the algorithm has certain discrimination, and on the basis, the local consistency of the class space and the sparse constraint are combined to more effectively reveal the inherent geometry and the discrimination structure of the data, and the discrimination capability of the algorithm is enhanced by keeping the class information of the samples and the inherent geometry manifold structure information among the samples, so that the clustering performance is greatly improved.
Drawings
The invention will be more fully understood and its attendant advantages and features will be more readily understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic flow diagram of a clustering method based on constrained sparse concept decomposition with dual local agreement in the invention.
Detailed Description
In order to make the contents of the present invention more clear and understandable, the contents of the present invention will be further described with reference to the accompanying drawings. Of course, the invention is not limited to this particular embodiment, and common alternatives known to those skilled in the art are also encompassed within the scope of the invention.
In the CF model, each basis vector v j Is the data point x i Non-negative linear combinations of (v) j =∑ i w ij x i Wherein the coefficient w ij And is more than or equal to 0. Let the base matrix w= [ W ] ij ]∈R n×k The goal of the CF algorithm is to find two matrices W and V such that X≡ XWV T . Thus, the objective function J of the CF algorithm CF Can be represented by formula (1):
wherein the data set x= [ X ] 1 ,x 2 ,...,x n ]Coefficient matrix v= [ V ] 1 ,v 2 ,...,v n ]。
According to the principle of local consistency, if x i And x j Is two similar samples, then the corresponding low-dimensional space represents z i And z j And are also similar. Assuming that the data set includes n samples in total, if the ith sample is a neighbor point of the jth sample, an edge exists between the ith sample and the jth sample, and the weight is s ij . Defining an edge weight matrix S ij Is of formula (2):
wherein N is p (x j ) Representing sample x j Is a set of p nearest neighbor data samples. The low-dimensional representation smoothness on the p-nearest neighbor map is measured by the formula. Measuring light of data points in a low-dimensional space with O is typically usedSlip degree, O may represent formula (3):
wherein Tr (·) represents the trace of the matrix, D represents a diagonal matrix whose terms are the side weight matrix S ij Sum of rows or columns (S is a symmetric matrix), i.e. D ii =∑ j S ij L is the laplacian matrix of the weight map and l=d-S.
The low-dimensional representation learned by the concept decomposition method is further projected into a label space, which defines the formula (4):
V=AZ (4)
wherein A= [ A ] l ;A u ]=[a 1 ,a 2 ,...,a n ] T ∈R n×c Is a label matrix, A l ∈R w×c Representing the class of marked samples, A u ∈R (n-w)×c Representing the class of unlabeled exemplars, Z ε R c×d Is an auxiliary matrix, a i ∈R c×1 The position of the largest element in (2) represents sample x i Belonging to the category. Let it be assumed that two data samples x i And x j Belonging to the k-th class, they have the same low rank representation, i.e. v, as known from equation (4) i =v j And a i And a j 1 in the k-th item, and the remaining items are 0.
The label matrix a and the coefficient matrix V are representations of the samples in a label space and a low-dimensional feature space, respectively. To preserve the inherent geometry of the data samples, the two representations may be further regularized with LE constraints, optimally reconstructing each sample in different spaces from the same linear combination of its local neighbors. Objective function J at this time DESCFS As formula (5):
where α represents a feature space local uniformity regularization parameter, β represents a class space local uniformity regularization parameter, and Tr (·) represents the trace of the matrix.
Based on the above, as shown in fig. 1, the invention provides a clustering method based on constrained sparse concept decomposition of dual local coincidence, which is characterized by comprising the following steps:
s10, acquiring samples to be clustered to form a sample data set to be clustered;
s20, constructing an adjacency matrix aiming at a sample data set to be clustered;
s30, establishing an objective function J based on conceptual decomposition DESCFS As formula (6):
wherein x= [ X ] 1 ,x 2 ,...,x n ]For an adjacency matrix constructed from sample data sets to be clustered, W is the basis matrix, a= [ a ] 1 ,a 2 ,...,a n ] T ∈R n×c For the label matrix, Z ε R c×d As an auxiliary matrix, a i ∈R c×1 Represents x i The representation in label space is x i The category to which R represents a real number set; alpha is a characteristic space local consistency regularization parameter, beta is a class space local consistency regularization parameter, lambda is a sparse parameter, and Tr (·) is a trace of a matrix; l is the laplace matrix of weights and l=d-S, D is a diagonal matrix, which is the side weight matrix S ij Sum of rows or columns of (D) ii =∑ j S ij
S40, iterating for preset times by using an iteration weighting method according to an objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z;
and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ.
Due to the objective function J DESCFS It is known that the desfs algorithm is non-convex for the entire base matrix W, tag matrix a, and auxiliary matrix Z, and therefore cannot solve for the global optimum. However, if the base matrix W, the tag matrix A and the auxiliary moment are madeTwo variables in the array Z are fixed, and the other variable is changed, so that the objective function is convex, and an iterative method is adopted to solve the local optimal solution of the objective function. With this objective function J DESCFS Can be reduced to a function Ω, as in equation (7):
wherein k=x T X。
Due to w ij ≥0,z ij More than or equal to 0, let ψ= [ ψ ] ij ],Φ=[φ ij ],γ=[γ ij ]The lagrangian function La may represent equation (8):
La=Ω+Tr(ΨW T )+Tr(ΦZ T )+Tr(γA T ) (8)
wherein ψ is the Lagrangian multiplier of W, Φ is the Lagrangian multiplier of Z, and γ is the Lagrangian multiplier of A.
The base matrix W, the tag matrix a and the auxiliary matrix Z are derived respectively to obtain formulas (9) to (11):
let k=x be an intermediate variable T X, intermediate variableAnd obtaining iterative updating rules of the base matrix W, the label matrix A and the auxiliary matrix Z according to KKT conditions, wherein,
the update rule of the base matrix W is as shown in formula (12):
the updating rule of the tag matrix A is as shown in the formula (13):
the update rule of the auxiliary matrix Z is as shown in formula (14):
and finally, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm to realize the clustering of the samples to be clustered.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that, for the above description of the preferred embodiment of the present invention, it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principle of the present invention, and these modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (2)

1. A clustering method based on dual local coincidence constraint sparse concept decomposition is used for text clustering, and is characterized by comprising the following steps:
s10, acquiring samples to be clustered to form a sample data set to be clustered;
s20, constructing an adjacency matrix aiming at the sample data set to be clustered;
s30, establishing an objective function J based on conceptual decomposition DESCFS
Wherein x= [ X ] 1 ,x 2 ,...,x n ]For an adjacency matrix constructed from sample data sets to be clustered, W isBase matrix, wherein a= [ a ] l ;A u ]=[a 1 ,a 2 ,...,a n ] T ∈R n×c Is a label matrix, A l ∈R w×c Representing the class of marked samples, A u ∈R (n-w)×c Representing the class of unlabeled exemplars, Z ε R c×d Is an auxiliary matrix, a i ∈R c×1 The position of the largest element in (2) represents sample x i The category to which R represents a real number set; alpha is a characteristic space local consistency regularization parameter, beta is a class space local consistency regularization parameter, lambda is a sparse parameter, and Tr (·) is a trace of a matrix; l is the laplacian matrix of the weight graph and l=d-S, D is a diagonal matrix, which is the side weight matrix S ij Sum of rows or columns of (D) ii =∑ j S ij
S40, iterating for preset times by using an iteration weighting method according to the objective function, and updating a base matrix W, a tag matrix A and an auxiliary matrix Z;
and S50, carrying out cluster analysis on the coefficient matrix V by adopting a K-Means clustering algorithm, wherein V=AZ.
2. The clustering method as claimed in claim 1, wherein in step S40, update rules of the base matrix W, the tag matrix a and the auxiliary matrix Z are respectively:
the update rule of the base matrix W is:
the update rule of the tag matrix a is:
the update rule of the auxiliary matrix Z is as follows:
wherein k=x T X,
CN202010507876.0A 2020-06-05 2020-06-05 Clustering method based on constraint sparse concept decomposition of dual local agreement Active CN111930934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010507876.0A CN111930934B (en) 2020-06-05 2020-06-05 Clustering method based on constraint sparse concept decomposition of dual local agreement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010507876.0A CN111930934B (en) 2020-06-05 2020-06-05 Clustering method based on constraint sparse concept decomposition of dual local agreement

Publications (2)

Publication Number Publication Date
CN111930934A CN111930934A (en) 2020-11-13
CN111930934B true CN111930934B (en) 2023-12-26

Family

ID=73316516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010507876.0A Active CN111930934B (en) 2020-06-05 2020-06-05 Clustering method based on constraint sparse concept decomposition of dual local agreement

Country Status (1)

Country Link
CN (1) CN111930934B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184298A (en) * 2015-08-27 2015-12-23 重庆大学 Image classification method through fast and locality-constrained low-rank coding process
CN107301643A (en) * 2017-06-06 2017-10-27 西安电子科技大学 Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms
CN109508737A (en) * 2018-10-31 2019-03-22 江苏理工学院 Constrained concept based on matrix of depths decomposes clustering method
CN109614581A (en) * 2018-10-19 2019-04-12 江苏理工学院 The Non-negative Matrix Factorization clustering method locally learnt based on antithesis
CN110096596A (en) * 2019-05-08 2019-08-06 广东工业大学 A kind of multiple view Text Clustering Method, device and equipment based on concept separating
CN110866560A (en) * 2019-11-15 2020-03-06 重庆邮电大学 Symmetric low-rank representation subspace clustering method based on structural constraint

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016183229A1 (en) * 2015-05-11 2016-11-17 Olsher Daniel Joseph Universal task independent simulation and control platform for generating controlled actions using nuanced artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184298A (en) * 2015-08-27 2015-12-23 重庆大学 Image classification method through fast and locality-constrained low-rank coding process
CN107301643A (en) * 2017-06-06 2017-10-27 西安电子科技大学 Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms
CN109614581A (en) * 2018-10-19 2019-04-12 江苏理工学院 The Non-negative Matrix Factorization clustering method locally learnt based on antithesis
CN109508737A (en) * 2018-10-31 2019-03-22 江苏理工学院 Constrained concept based on matrix of depths decomposes clustering method
CN110096596A (en) * 2019-05-08 2019-08-06 广东工业大学 A kind of multiple view Text Clustering Method, device and equipment based on concept separating
CN110866560A (en) * 2019-11-15 2020-03-06 重庆邮电大学 Symmetric low-rank representation subspace clustering method based on structural constraint

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Low-rank doubly stochastic matrix decomposition for cluster analysis;Zhirong Yang 等;《JMLR》;6454–6478 *
核稀疏概念编码算法及在图像表示中的应用;舒振球 等;《系统工程理论与实践》;1331-1339 *
结合新概念分解和频繁词集的短文本聚类;贾瑞玉 等;《小型微型计算机系统》;1321-1326 *

Also Published As

Publication number Publication date
CN111930934A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
Song et al. Auto-encoder based data clustering
Hammer et al. Learning vector quantization for (dis-) similarities
Kaski et al. Trustworthiness and metrics in visualizing similarity of gene expression
He Laplacian regularized d-optimal design for active learning and its application to image retrieval
JP5565190B2 (en) Learning model creation program, image identification information addition program, learning model creation device, and image identification information addition device
CN110942091B (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
Chakraborty et al. Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm
Zhang et al. A survey on concept factorization: From shallow to deep representation learning
CN104392231A (en) Block and sparse principal feature extraction-based rapid collaborative saliency detection method
CN109657611B (en) Adaptive image regularization non-negative matrix decomposition method for face recognition
Yin Nonlinear dimensionality reduction and data visualization: a review
Bi et al. Beyond mahalanobis metric: cayley-klein metric learning
Zhang et al. Autoencoder-based unsupervised clustering and hashing
Peng et al. Hyperplane-based nonnegative matrix factorization with label information
CN114299362A (en) Small sample image classification method based on k-means clustering
Chen et al. Stability-based preference selection in affinity propagation
CN111930934B (en) Clustering method based on constraint sparse concept decomposition of dual local agreement
Yang et al. Robust landmark graph-based clustering for high-dimensional data
Ng et al. Incremental hashing with sample selection using dominant sets
Jena et al. Elitist TLBO for identification and verification of plant diseases
CN109614581A (en) The Non-negative Matrix Factorization clustering method locally learnt based on antithesis
Zhang et al. Adaptive graph-based discriminative nonnegative matrix factorization for image clustering
CN112364902B (en) Feature selection learning method based on self-adaptive similarity
Lv et al. Incremental semi-supervised graph learning NMF with block-diagonal
CN115496933A (en) Hyperspectral classification method and system based on space-spectrum prototype feature learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant