CN112417234A - Data clustering method and device and computer readable storage medium - Google Patents
Data clustering method and device and computer readable storage medium Download PDFInfo
- Publication number
- CN112417234A CN112417234A CN201910784526.6A CN201910784526A CN112417234A CN 112417234 A CN112417234 A CN 112417234A CN 201910784526 A CN201910784526 A CN 201910784526A CN 112417234 A CN112417234 A CN 112417234A
- Authority
- CN
- China
- Prior art keywords
- data set
- original data
- matrix
- clustering
- weight matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000003860 storage Methods 0.000 title claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims abstract description 228
- 230000003595 spectral effect Effects 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims description 53
- 238000012545 processing Methods 0.000 claims description 29
- 230000015654 memory Effects 0.000 claims description 26
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 230000001131 transforming effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 6
- 230000000875 corresponding effect Effects 0.000 description 82
- 238000004422 calculation algorithm Methods 0.000 description 57
- 238000010586 diagram Methods 0.000 description 14
- 238000004590 computer program Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 101100533306 Mus musculus Setx gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a data clustering method and a device and a computer readable storage medium, wherein the data clustering method comprises the following steps: receiving and converting an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; based on the similarity matrix, the clustering result corresponding to the original data set is obtained by utilizing spectral clustering, so that an ideal clustering effect can be obtained, and the clustering performance is effectively improved.
Description
Technical Field
The present invention relates to data detection technologies, and in particular, to a data clustering method and apparatus, and a computer-readable storage medium.
Background
When the data set of the high-dimensional data is clustered, the high-dimensional data from different subspaces can be divided into respective low-dimensional subspaces according to the potential subspace structure of the data set, and the different subspaces correspond to different categories. In many fields, subspace clustering algorithms are widely used, wherein linear representation-based subspace clustering algorithms represented by Sparse Subspace Clustering (SSC), Low rank representation (Low rank representation) subspace clustering (LRR) and Least Squares Regression (LSR) subspace clustering algorithms have attracted extensive interest of researchers due to the simplicity of the algorithms and the effectiveness of high-dimensional data clustering.
At present, the commonly used subspace clustering algorithm based on linear representation is obtained by l1The norm, the kernel norm or the F-norm constraint representation coefficient is used to obtain a representation coefficient Z with a block diagonal structure, however, the obtained representation coefficient Z is usually insufficient due to the single norm constraint representation coefficient Z, so that the final clustering result is not ideal and the clustering performance is low.
Disclosure of Invention
To solve the above technical problems, embodiments of the present invention desirably provide a data clustering method and apparatus, and a computer-readable storage medium,
in order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:
the embodiment of the invention provides a data clustering method, which comprises the following steps:
receiving and converting an original data set;
determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set;
determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix;
establishing a similarity matrix corresponding to the original data set according to the representation coefficients;
and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set.
The data clustering device receives and converts an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set. Therefore, in the embodiment of the application, the data clustering device can obtain a denoised low-rank dictionary from the original data set, and then combine the weight matrix obtained according to the original data set to construct the target coefficient, so as to obtain the similarity matrix corresponding to the original data set, so as to perform clustering processing on the original data set by using the similarity matrix, and obtain the corresponding clustering result.
Drawings
FIG. 1 is a basic framework of a subspace clustering algorithm based on linear representations;
fig. 2 is a first schematic flow chart illustrating an implementation process of a data clustering method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a partial relationship;
fig. 4 is a schematic diagram illustrating a second implementation flow of a data clustering method according to an embodiment of the present application;
fig. 5 is a schematic diagram of a first structural component of a data clustering device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data clustering device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.
With the rapid development of information technology, data is ubiquitous in our daily life, the huge scale and the complex structure of the data bring many challenges to data processing, and how to effectively mine valuable information from the data becomes a big problem. With the introduction of a classical clustering algorithm, the clustering algorithm can effectively solve the problem of low-dimensional data clustering, but the application environment is changed day by day, high-dimensional data can be seen everywhere in work and life, the dimensionality of various image data, video data and text data is often as high as ten thousands of dimensions, for example, a picture shot by a smart phone can reach tens of thousands of pixels, and the traditional clustering algorithm cannot obtain an ideal result when the problem of high-dimensional data clustering is processed. The main problems faced by high-dimensional data clustering are: data distribution in a high-dimensional space is sparser than data distribution in a low-dimensional space, distances among the data are almost equal, and some irrelevant attributes exist in the data, so that clustering cannot be realized according to the distance relation among the data in the high-dimensional space generally. The subspace clustering algorithm is an extension of the conventional clustering algorithm, and high-dimensional data from different subspaces are divided into respective low-dimensional subspaces according to the potential subspace structure of a data set, wherein the different subspaces correspond to different categories. Subspace clustering algorithms are widely used in many fields, for example: image clustering, motion segmentation, etc. Currently, among subspace clustering algorithms, a subspace clustering algorithm based on linear representation is a research hotspot in the field due to the superior clustering performance of the subspace clustering algorithm.
Subspace clustering algorithms based on linear representations expect to better construct the similarity matrix by exploiting global information between data points. Linear representation-based subspace clustering algorithms, represented by Sparse Subspace Clustering (SSC), Low rank representation subspace clustering (LRR) and least squares regression subspace clustering (LSR), have attracted extensive interest to researchers due to the simplicity of their algorithms and the effectiveness of high-dimensional data clustering. The algorithm does not need to know the dimension of the subspace, the self-expression of the data is utilized to obtain the expression coefficient of each data point, the obtained expression coefficient is used for establishing a similarity matrix, and the similarity matrix is applied to spectral clustering to obtain a clustering result.
SSC algorithm under the assumption of linear representation by l1Norm minimization forces the sparsity of the representation coefficient matrix to be zero between classes and sparse within classes. The LRR algorithm is able to well group together highly correlated data by minimizing the kernel norm to reveal the lowest rank representation of the global structure of the data. And when processing data containing noise and serious pollution, good robustness can be obtained. The LSR algorithm uses the F-norm to constrain the representation coefficients so that there is a grouping effect between the coefficients, maintaining the aggregate performance of the correlated data. Under the assumption of subspace independence, the representation matrix obtained by the LSR algorithm has a block diagonal structure. When the data points are insufficient, the obtained matrix of the representative coefficients also has a block diagonal structure under the assumption that the subspaces are orthogonal. Meanwhile, the objective function of the LSR algorithm can solve the analytic solution, so that the iterative solution process is avoided, and the time complexity of the algorithm is greatly reduced. Fig. 1 is a basic framework of a subspace clustering algorithm based on linear representation, and as shown in fig. 1, the subspace clustering algorithm based on linear representation mainly performs linear representation on an input data set to obtain a representation coefficient, then constructs a similarity matrix according to the representation coefficient, and performs spectral clustering by using the similarity matrix obtained by the construction, so as to obtain a clustering result.
Classical subspace clustering algorithm based on linear representation, by l1Norm, normThe number or F-norm constraint representation coefficients to find the representation coefficients Z with a block diagonal structure, whereas a single norm constraint representation coefficient Z, which typically has a deficiency, such as the SSC algorithm by minimizing l1Norm to obtain the sparsest representation of the samples as a coefficient matrix, minimizing l if the data from the same subspace has high correlation1Norm, which usually randomly selects a small number of data points for linear representation, while ignoring other relevant data points, the coefficient matrix obtained does not guarantee the connection between the data points within the class, and thus, although the SSC algorithm can construct a sparse similarity matrix, it may not achieve satisfactory results. The LRR algorithm finds the lowest rank representation between the high dimensional data, and can obtain the global structure of the data. The LRR algorithm solves the optimization problem using a minimized kernel norm instead of rank minimization. Although the low-rank representation clustering algorithm can obtain a representation coefficient matrix with good block diagonal properties, the algorithm only focuses on the constraint of global rank, so that the final representation coefficient matrix lacks sparsity, a large number of nonzero elements still exist in inter-class representation coefficients, and the intra-class representation coefficients have large difference, so that the final clustering result is not ideal enough.
In order to overcome the defects of the classical subspace clustering algorithm based on linear representation, a non-negative low-rank sparse graph is used for semi-supervised learning to use l1The norm and the kernel norm are simultaneously introduced into the objective function so as to achieve the effect of eliminating the representation coefficient which is too dense among classes. The low-rank representation algorithm with the structured constraint adds the structured sparse constraint in the low-rank representation subspace clustering algorithm, so that the algorithm can better represent coefficients among sparse classes, and can process more general subspace distribution structures. The smooth representation clustering restrains the representation coefficients through the local relation between data, so that the in-class representation coefficients tend to be smooth, and ideal clustering quality is obtained.
The data clustering method can utilize a smooth low-rank representation subspace clustering algorithm (SSLRR) to introduce local similarity constraint into an LRR target function, improve intra-class consistency of representation coefficients through a local relation between data points, and introduce Structured sparse constraint into the target function to increase inter-class sparsity of the representation coefficients. In order to enable the algorithm to better process data containing noise, the algorithm firstly obtains a low-rank structure dictionary through a low-rank recovery technology for linearly representing an original data set, so that the robustness of the algorithm for processing the noise data is improved, and meanwhile, higher clustering performance can be obtained.
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
An embodiment of the present invention provides a data clustering method, and fig. 2 is a schematic view illustrating an implementation flow of the data clustering method provided in the embodiment of the present application, as shown in fig. 2, in an embodiment of the present invention, a method for performing data clustering by a data clustering device may include the following steps:
In an embodiment of the present application, the data clustering device may receive the original data set first, and perform dimension conversion on the original data set after receiving the original data set.
Further, in an embodiment of the present application, the original data set may be high-dimensional data, for example, the original data set may be Extended Yale B face data set, Augmented Reality (AR) face data set, or high-dimensional data such as hand-written digital data set.
It should be noted that, in the embodiment of the present application, the data clustering device may be a device integrated with a data clustering algorithm, and the data clustering device may be used to perform clustering, analysis, and experiments on a data set. For example, the data clustering means may be installed with a subspace clustering application, for example, the data clustering means may be installed with a face clustering application or a handwritten digit clustering application.
Further, in embodiments of the present application, the raw data set may be a high dimensional data set, e.g., a raw data setX=[x1,x2,...,xn]∈Rm×nWhere each column represents a sample of data, n represents the number of data, m represents the dimension of the data, xiRepresenting the ith sample in the dataset.
It should be noted that, in the embodiment of the present application, after receiving the original data set, the data clustering device may perform dimensionality reduction processing on the original data set, so as to perform dimensionality conversion on the original data set. Specifically, when the data clustering device performs the dimensionality reduction process on the original data set, the dimensionality of the data can be reduced to 6 × k dimensions by Principal Component Analysis (PCA), where k represents a category parameter.
And 102, determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set.
In the implementation of the present application, after receiving and converting the original data set, the data clustering device may determine the low-rank dictionary and the weight matrix corresponding to the original data set according to the original data set.
It should be noted that, in the embodiment of the present application, the original data set received by the data clustering device may carry random noise, that is, data contaminated by noise may exist in the original data set. In order to better handle the problem of noisy data clustering, the data clustering device may use Robust Principal Component Analysis (RPCA) to recover a discriminative low rank dictionary from the original data set.
Further, in the embodiment of the present application, the data clustering device may extract the low rank dictionary from the original data set according to a first objective function, wherein the first objective function may be used for denoising the original data set, specifically, the expression of the first objective function is shown in formula (1),
minA,E‖A‖*+γ‖E‖1 s.t.X=A+E (1)
wherein | A |*Represents the kernel norm, | E | of the matrix1L representing a matrix1Norm, in particular, first orderThe calibration function can be solved by using an inaccurate Lagrange multiplier algorithm, and finally the low-rank dictionary A is obtained.
Further, in the embodiment of the present application, the data clustering device may further obtain a weight matrix corresponding to the original data set according to the original data set. The weight matrix may include a first weight matrix and a second weight matrix. Specifically, the first weight matrix is used for reducing the representation coefficient; the second weight matrix is used for representing the local relation of the data in the original data set in the original space.
It should be noted that, in the embodiment of the present application, the data clustering device may involve the first weight matrix and the second weight matrix in a third objective function used when clustering the original data set, and therefore, the data clustering device may determine the first weight matrix and the second weight matrix according to the original data set.
Further, in the embodiment of the present application, the weight values in the first weight matrix can be obtained by formula (2),
wherein, WijAre the weight values in the first weight matrix,andare respectively data points xiAnd xjThe matrix B is defined according to equation (3),
the parameter σ is the average of all elements in the matrix B. The first weight matrix can be defined by formula (2), so that the weights between data points in different subspaces in the original data set can be definedWith a larger value, the weight value between data points in the original data set at the same subspace tends to zero, which in turn can be reduced by minimizing the data item | W |1To better reduce the significand coefficient between data points in different subspaces, wherein, as a Hadamard product, in the embodiment of the present application, it is defined that H | W | Z |1。
Further, in the embodiment of the present application, in order to better characterize the local relationship between the data points in the original data set, the data clustering device may determine the local relationship between the data points through a Local Linear Embedding (LLE) graph. First determine each data point xiK of (d) and then using data point xiK near neighbor point pair xiPerforming linear reconstruction, solving weight value by using minimized reconstruction error, and obtaining weight value M in second weight matrixijRepresenting the contribution of the jth data point to the reconstruction of the ith data point, the closer the two data points are, the greater the weight between the two data points. For example, fig. 3 is a schematic diagram of a local relationship, and in a high-dimensional space, when a neighbor point K is 3, a data point xiWith 3 neighboring points xj、xk、xlThe linear reconstruction relationship therebetween is shown in FIG. 3, wherein Wij、Wik、WilAre data points x, respectivelyiAnd xj、xk、xlThe weight value in between. Based on two constraints: (1) each data point is linearly reconstructed by K nearest neighbor data points when a certain data point xjK neighbors not belonging to a data point, Mij0; (2) the sum of the reconstruction weight coefficients of each data point is 1, the second objective function of the data clustering device for solving the second weight matrix can be expressed as formula (4),
where n represents the number of data points, QiRepresents each data point xiThe subscript set of K neighbors of (1) defines equation (5),
Vjk=(xi-xj)T(xi-xk) (5)
then, MijCan be expressed as the formula (6),
further, in the embodiment of the present application, the data clustering device may determine the second weight matrix according to equation (6), specifically, the second weight matrix may be a symmetric non-negative weight matrix, for example, the second weight matrix M may be represented by equation (7),
it should be noted that, in the embodiment of the present application, after receiving the original data set, the data clustering device may determine, based on the original data set, the low-rank dictionary, the first weight matrix, and the second weight matrix according to the above formula (1) and the value formula (7), so as to continue to determine the representation coefficients according to the low-rank dictionary and the weight matrix.
And 103, determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix.
In the embodiment of the application, after determining the low-rank dictionary and the weight matrix corresponding to the original data set according to the original data set, the data clustering device may further determine the representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix.
It should be noted that, in the embodiment of the present application, the LRR algorithm can well obtain the global structure of the data through the low rank criterion, but the inter-class representation coefficients generate a large number of non-zero elements, thereby affecting the accuracy of the clustering. In the embodiments of the present application, l may be1The norm is introduced into the LRR objective function, i.e. into a third objective function for clustering, i.e. the third objective function may be an objective function corresponding to the LRR algorithm, so that it is possible to usel1Norm-improvement represents sparsity of coefficients. Specifically, the third objective function can be expressed according to equation (8),
minZ,E‖Z‖*+β‖Z‖1+γ‖E‖1 s.t.X=AZ+E (8)
where β, γ are used to balance the effects of low rank, sparse and noise terms. In particular, in embodiments of the present application, structured sparse constraint term minimization is superior to criterion/1Norm minimization, so equation (8) can be converted to equation (9) to represent the third objective function,
minZ,E‖Z‖*+βH+γ‖E‖2,1 s.t.X=AZ+E (9)
wherein, W is the first weight matrix in the weight matrix. In order to be able to better obtain the local relationship of the data in the raw data set, it can be assumed that if data point x is obtainediAnd xjIf the data distribution is similar in the potential geometry, then the two data points are also similar when embedded or projected into a new space, and specifically, the data clustering device may first define L ═ D-M as a laplacian matrix and D as a degree matrixThen, the formula (9) is converted by the formula laplace matrix, and the converted third target function is obtained as shown in the formula (10), that is, in mathematics, the assumed relationship can be expressed as the formula (10),
wherein M is a second weight matrix in the weight matrix, reflecting the local relationship of the data in the original data set in the original space, ziAnd zjAre respectively data points xiAnd xjThe corresponding representing coefficients. The formula (9) and the formula (10) are fused, and the expression coefficient is restricted through the local relation between data points, so that the in-class expression coefficient tends to be smooth, the improvement of the final clustering accuracy is promoted, and the conversion is realizedThe latter third objective function can be expressed by equation (11),
where α is used to balance the effects of the regularization term of the graph with the other three terms.
Further, in the embodiment of the present application, in order to effectively solve the above equation (11), the data clustering apparatus may use an alternating direction multiplier algorithm to iteratively solve the equation (11). Specifically, the data clustering device can introduce the preset auxiliary variable J, T E Rn×nThe above equation (11) can be converted into equation (12),
using lagrange multiplier reconstruction equation (12), equation (13) can be obtained,
wherein, YA、YBAnd YCRepresents a lagrange multiplier and mu represents a penalty parameter to control the convergence of the third objective function.
It should be noted that, in the embodiment of the present application, the data clustering device may utilize singular value soft threshold operation based on YCZ, updating and iterating J; also, the data clustering means may operate using a shrink threshold based on YbZ, updating and iterating T; further, the data clustering device can also use a Bartels-Stewart algorithm to solve, iteration is carried out based on a low-rank dictionary, and in the iteration process, the representation coefficient Z has a unique solution, so that the optimal value of the representation coefficient can be obtained.
And 104, establishing a similarity matrix corresponding to the original data set according to the representation coefficients.
In the embodiment of the application, after determining the representation coefficients corresponding to the original data set according to the low-rank dictionary and the weight matrix, the data clustering device may establish the similarity matrix corresponding to the original data set according to the representation coefficients.
It should be noted that, in the embodiment of the present application, after obtaining the representation coefficients, the data clustering device may construct the similarity matrix according to the representation coefficients, specifically, the data clustering device may establish the similarity matrix according to equation (14),
it should be noted that, in the embodiment of the present application, the similarity matrix determined by the data clustering device according to the formula (14) may be used for performing spectral clustering on the original data set.
And 105, based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set.
In the embodiment of the application, after the data clustering device establishes the similarity matrix corresponding to the original data set according to the representation coefficient, the clustering result corresponding to the original data set can be obtained by utilizing spectral clustering based on the similarity matrix.
Further, in the embodiment of the present application, after performing dimensionality reduction processing on the original data set, the data clustering device may further determine a category parameter corresponding to the original data set.
It should be noted that, in the embodiment of the present application, after determining the similarity matrix, the data clustering device may further determine the normalized symmetric laplacian matrix according to the similarity matrix, then may obtain K eigenvectors in the normalized symmetric laplacian matrix according to the category parameter K of the original data set, and perform normalization processing on the target matrix formed by the K eigenvectors, and then may use a K-means clustering algorithm on the normalized target matrix, and may finally output the class allocation of the original data set, that is, obtain the clustering result corresponding to the original data set.
In the data clustering method provided by the embodiment of the application, a data clustering device receives and converts an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set. Therefore, in the embodiment of the application, the data clustering device can obtain a denoised low-rank dictionary from the original data set, and then combine the weight matrix obtained according to the original data set to construct the target coefficient, so as to obtain the similarity matrix corresponding to the original data set, so as to perform clustering processing on the original data set by using the similarity matrix, and obtain the corresponding clustering result.
Example two
Based on the first embodiment, in another embodiment of the present application, when the data clustering device solves the converted third objective function, that is, when the equation (11) is solved, the data clustering device may iteratively solve the converted third objective function according to preset auxiliary variables to obtain the expression coefficient.
Further, in the embodiment of the present application, the data clustering device may introduce the preset auxiliary variable J, T ∈ Rn×nAnd reconstructing the data by using an augmented Lagrange multiplier method after introducing the preset auxiliary variable to obtain the formula (13), and then sequentially updating the preset auxiliary variable J, the preset auxiliary variable T, Z, E, the Lagrange multiplier and the mu to obtain the optimal expression coefficient Z*。
In the embodiments of the present application, it is exemplified that X is [ X ] for the original data set1,x2,...,xn]∈Rm×nWhen the determination of the representation coefficients is performed, the smooth low-rank representation subspace clustering algorithm proposed by the data clustering device may include the following steps:
step 201, initializing variables.
Setting the maximum iteration number maxter as 1000, the current iteration number k as 0, initializing Z as J as T as 0, E as 0, YA=0,YB=YC=0,μ=10-6,maxμ=1010,ρ=1.1,ε=10-8. Wherein | Z-J | Y∞>Epsilon or Z-T Y phosphor∞>Epsilon or | | X-AZ-E | | non-woven phosphor∞>ε。
And step 202, updating a preset auxiliary variable J.
The fixed other variables update the preset auxiliary variable J,specifically, when updating variable J, the singular value soft threshold operation is utilized to makePerforming singular value decomposition on P, and SVD (P) ([ U, Sigma, V)]And thresholding the singular value matrix sigma: gτ(∑)=diag((σi-τ)+) Where σ isiIs the main diagonal element of sigma and is also the singular value of the matrix P, tau is the threshold value, takeGτ(∑) denotes: if the diagonal element σ isiIf it is larger than τ, take σi=σiτ, else σi0. The optimal solution of the final J per iteration is that J is UGτ(∑)VT。
And step 203, updating the preset auxiliary variable T.
The fixed other variables update the preset auxiliary variable T,specifically, when updating the variable T, a shrink threshold operation is utilizedLet us orderIn this case, the variable T may be expressed as T ═ Sε(Q), for each element T in TijThe following relationship of formula (15) is satisfied:
and step 204, updating the variable Z.
Updating the variable Z by fixing other variables, and specifically, solving the equation mu A by using a Bartels-Stewart algorithm when updating the variable ZTAZ+αZ(2I+L)+(-ATYA+YB+YC+μ(ATE-ATX-J-T))=0。ATA is a semi-positive definite matrix, so for ATArbitrary characteristic value p of AiSatisfies pi≧ 0, 2I + L is the positive definite matrix, so for any eigenvalue μ of 2I + LiSatisfies mui>0. Because for any characteristic value piAnd muiSatisfies pi+μi>0, in the iteration process, the variable Z has a unique solution.
And step 205, updating the variable E.
The other variables are fixed to update the variable E, where E satisfies the following equation (16):
specifically, when the variable E is updated, it is setuiEach column of E, representing each column of matrix U, satisfies the condition of the following equation (17):
and step 206, updating the Lagrange multiplier.
For Lagrange multiplier YA、YBAnd YCAnd (6) updating. In particular, may be according to YA=YA+μ(X-AZ-E)、YB=YB+ mu (Z-T) and YC=YC+ mu (Z-J) for YA、YBAnd YCAnd (6) updating.
And step 207, updating the penalty parameter mu.
In terms of μ ═ min (ρ μ, max)μ) The penalty parameter is updated.
Step 208, let k equal to k +1, repeat the above steps 202 to 207 until the optimal expression coefficient Z is output*。
In the data clustering method provided by the embodiment of the application, a data clustering device receives and converts an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set. Therefore, in the embodiment of the application, the data clustering device can obtain a denoised low-rank dictionary from the original data set, and then combine the weight matrix obtained according to the original data set to construct the target coefficient, so as to obtain the similarity matrix corresponding to the original data set, so as to perform clustering processing on the original data set by using the similarity matrix, and obtain the corresponding clustering result. EXAMPLE III
Based on the first embodiment and the second embodiment, in a further embodiment of the present application, fig. 4 is a schematic diagram illustrating an implementation flow of a data clustering method provided in the embodiment of the present application, as shown in fig. 4, a method for obtaining a clustering result corresponding to an original data set by using spectral clustering based on a similarity matrix by a data clustering device may include the following steps:
and 301, calculating to obtain a normalized symmetric Laplacian matrix corresponding to the original data set according to the similarity matrix.
In the embodiment of the application, after the data clustering device determines the similarity matrix, the original data set can be clustered according to a normalized symmetric spectral clustering algorithm.
Further, in the embodiment of the present application, the data clustering device may first obtain the normalized symmetric laplacian matrix corresponding to the original data set according to the similarity matrix. For example, based on the similarity matrix C obtained by the above equation (14), a normalized symmetric laplacian matrix L corresponding to the original data set is obtained by calculationsym。
And 302, forming a target matrix according to the class parameters and the normalized symmetrical Laplace matrix.
In the embodiment of the application, after the data clustering device obtains the normalized symmetric laplacian matrix according to the similarity matrix, the target matrix can be further constructed by combining the class parameters corresponding to the original data set.
It should be noted that, in the embodiment of the present application, when the type parameter is k, the data clustering device may first calculate the laplacian matrix LsymThe first k eigenvectors u1,u2,…,ukThen according to k eigenvectors u1,u2,…,ukForm an object matrix U ═ U1,u2,…,uk]∈Rn×k。
In the embodiment of the application, after the data clustering device constructs the target matrix according to the class parameters and the normalized symmetric laplacian matrix, the target matrix may be normalized, so that the normalized target matrix may be obtained. Specifically, the data clustering device may normalize the target matrix U by rowsTo the normalized target matrix T ∈ Rn×k。
And 304, clustering the normalized target matrix to obtain a clustering result corresponding to the original data set.
In the embodiment of the application, the data clustering device may perform the clustering process on the normalized target matrix after performing the normalization process on the target matrix to obtain the normalized target matrix, so as to obtain the clustering result corresponding to the original data set.
Further, in the embodiment of the present application, the data clustering device may classify each row q in the normalized target matrix Ti∈RkIs regarded as RkAnd (3) one point in the space is subjected to a K-means clustering algorithm, so that a clustering result corresponding to the original data set can be obtained.
In the data clustering method provided by the embodiment of the application, a data clustering device receives and converts an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set. Therefore, in the embodiment of the application, the data clustering device can obtain a denoised low-rank dictionary from the original data set, and then combine the weight matrix obtained according to the original data set to construct the target coefficient, so as to obtain the similarity matrix corresponding to the original data set, so as to perform clustering processing on the original data set by using the similarity matrix, and obtain the corresponding clustering result.
Example four
Based on the first to third embodiments, the data clustering device performs clustering processing on the original data set according to the SSLRR to obtain a corresponding clustering result, and in order to verify the clustering effect of the SSLRR, the embodiments of the present application propose the following two proof ways from a theoretical perspective.
The first method is as follows: the optimal solution for SSLRR has a block diagonal structure.
For the problem of equation (18) without considering noise:
given a set of m-dimensional datasets, X ═ X1,x2,...,xn]=[X1,X2,…,Xk]∈Rm×nAnd data set X is taken from k independent linear subspacesWherein XiIs m × niEach column of which is from the same subspace SiAnd n is1+n2+…+ni=n,Z*Is the optimal solution to the minimization problem (18), then the coefficient Z is represented*Has a block diagonal structure.
Suppose Z*Is the optimal solution of the objective function (18), defining a formula (19),
and ZC=Z*-ZD,ZCIs greater than or equal to 0, and Z is the orthogonality assumption of subspaceDIs also a feasible solution to the objective function (17), and is derived from the kernel-norm nature of the matrix, | | Z*||*≥||ZD||*. From ZCMore than or equal to 0, tr (Z) can be deduced*LZ*T)=tr((ZD+ZC)L(ZD+ZC)T)≥tr(ZDL(ZD)T) Since the weight matrix W is a non-negative momentArray, therefore, for H, one can obtain:
wherein L ═ W-D‖1From | | | Z*||*≥||ZD||*、tr(Z*LZ*T)≥tr(ZDL(ZD)T) And | W | Z*‖1≥‖W⊙ZD‖1It can be deduced that:
||Z*||*+tr(Z*LZ*T)+‖W⊙Z*‖1≥||ZD||*+tr(ZDL(ZD)T)+L (21)
and because of Z*Is the optimal solution of equation (18), and therefore, | | Z can be obtained*||*+tr(Z*LZ*T)+‖W⊙Z*‖1=||ZD||*+tr(ZDL(ZD)T)+L,ZC0, to obtain Z*=ZDTherefore, the optimal solution Z of equation (18) has a block diagonal structure.
The second method comprises the following steps: and (5) analyzing time complexity.
For a data set X ═ X1,x2,...,xn]∈Rm×nIn the above step 101, the time complexity of recovering a low rank dictionary A using RPCA is O (t)1n3),t1Representing the number of iterations of the algorithm. The updating J, T, E and the Lagrangian multiplier Y in the above steps 202 to 207A、YB、YcRespectively, is O (n)3)、O(n2)、O(mn2)、O(mn2)、O(n2)、O(n2) When updating Z, the Bartels-Stewart algorithm is used to solve the Sylvester equation, so the time complexity is O (n)3) Therefore, the overall time complexity in the above steps 202 to 207 is O (3 t)2n2+2t2mn2+2t2n3) If m is<n, timeComplexity O (2 t)2n3),t2Representing the number of iterations of the alternating direction multiplier algorithm. Step 105 spectral clustering has an overall temporal complexity of O (n)3). Therefore, the temporal complexity of the SSLRR algorithm proposed in this chapter is O ((t)1+2t2+1)n3)。
In the data clustering method provided by the embodiment of the application, a data clustering device receives and converts an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set. Therefore, in the embodiment of the application, the data clustering device can obtain a denoised low-rank dictionary from the original data set, and then combine the weight matrix obtained according to the original data set to construct the target coefficient, so as to obtain the similarity matrix corresponding to the original data set, so as to perform clustering processing on the original data set by using the similarity matrix, and obtain the corresponding clustering result.
EXAMPLE five
Based on the first to fourth embodiments, fig. 5 is a schematic structural diagram of a data clustering device according to an embodiment of the present application, as shown in fig. 5, in an embodiment of the present invention, a data clustering device 1 includes a receiving unit 11, a converting unit 12, a determining unit 13, an establishing unit 14, and an obtaining unit 15,
the receiving unit 11 is configured to receive an original data set.
The conversion unit 12 is configured to convert the original data set.
The determining unit 13 is configured to determine, according to the original data set, a low-rank dictionary and a weight matrix corresponding to the original data set; and determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix.
The establishing unit 14 is configured to establish a similarity matrix corresponding to the original data set according to the representation coefficient.
The obtaining unit 15 is configured to obtain a clustering result corresponding to the original data set by using spectral clustering based on the similarity matrix.
Further, in the embodiment of the present application, the converting unit 12 is specifically configured to perform dimensionality reduction processing on the original data set after receiving the original data set.
Further, in an embodiment of the present application, the determining unit 13 is specifically configured to determine the low-rank dictionary from the original data set according to a first objective function; the first objective function is used for denoising the original data set; or, the determining unit 13 is further specifically configured to obtain a third objective function according to the first weight matrix; obtaining a Laplace matrix according to the second weight matrix; converting the third objective function according to the Laplace matrix to obtain a converted third objective function; and solving the converted third objective function to obtain the representation coefficient.
Further, in an embodiment of the present application, the weight matrix includes a first weight matrix and the second weight matrix, and the determining unit 13 is further specifically configured to calculate the first weight matrix according to the original data set; wherein the first weight matrix is used for reducing the representation coefficient; determining the second weight matrix according to a second objective function and the original data set; the second weight matrix is used for representing the local relation of the data in the original data set in the original space.
Further, in an embodiment of the present application, the determining unit 13 is further specifically configured to perform iterative solution on the converted third objective function according to a preset auxiliary variable, so as to obtain the representation coefficient.
Further, in an embodiment of the present application, the determining unit 13 is further configured to determine a category parameter corresponding to the original data set after performing dimensionality reduction processing on the original data set.
Further, in an embodiment of the present application, the obtaining unit 15 is specifically configured to obtain a normalized symmetric laplacian matrix corresponding to the original data set according to the similarity matrix calculation; forming a target matrix according to the category parameters and the normalized symmetrical Laplace matrix; carrying out normalization processing on the target matrix to obtain a normalized target matrix; and clustering the normalized target matrix to obtain a clustering result corresponding to the original data set.
Fig. 6 is a schematic diagram of a composition structure of the data clustering device according to the embodiment of the present application, and as shown in fig. 6, the data clustering device 1 according to the embodiment of the present application may further include a processor 16 and a memory 17 storing executable instructions of the processor 16, and further, the data clustering device 1 may further include a communication interface 18, and a bus 19 for connecting the processor 16, the memory 17, and the communication interface 18.
In the embodiment of the present Application, the Processor 16 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the processor functions may be other devices, and the embodiments of the present application are not limited in particular. The data clustering device 1 may further comprise a memory 17, which memory 17 may be connected to the processor 16, wherein the memory 17 is adapted to store executable program code comprising computer operating instructions, and the memory 17 may comprise a high speed RAM memory and may further comprise a non-volatile memory, e.g. at least two disk memories.
In the embodiment of the present application, the bus 19 is used to connect the communication interface 18, the processor 16, and the memory 17 and the intercommunication among these devices.
In the embodiment of the present application, the memory 17 is used for storing instructions and data.
Further, in an embodiment of the present application, a processor 16 for receiving and converting the original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set.
In practical applications, the Memory 17 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 16.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The data clustering device provided by the embodiment of the application receives and converts an original data set; determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix; establishing a similarity matrix corresponding to the original data set according to the representation coefficients; and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set. Therefore, in the embodiment of the application, the data clustering device can obtain a denoised low-rank dictionary from the original data set, and then combine the weight matrix obtained according to the original data set to construct the target coefficient, so as to obtain the similarity matrix corresponding to the original data set, so as to perform clustering processing on the original data set by using the similarity matrix, and obtain the corresponding clustering result.
An embodiment of the present application provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the program implements the data clustering method as described above.
Specifically, the program instructions corresponding to a data clustering method in this embodiment may be stored in a storage medium such as an optical disc, a hard disc, or a usb disk, and when the program instructions corresponding to a data clustering method in the storage medium are read or executed by an electronic device, the method includes the following steps:
receiving and converting an original data set;
determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set;
determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix;
establishing a similarity matrix corresponding to the original data set according to the representation coefficients;
and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, display, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.
Claims (17)
1. A method for clustering data, the method comprising:
receiving and converting an original data set;
determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set;
determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix;
establishing a similarity matrix corresponding to the original data set according to the representation coefficients;
and based on the similarity matrix, utilizing spectral clustering to obtain a clustering result corresponding to the original data set.
2. The method of claim 1, wherein transforming the raw data set comprises:
after receiving the raw data set, performing dimensionality reduction processing on the raw data set.
3. The method of claim 1, wherein determining the low rank dictionary corresponding to the original data set from the original data set comprises:
determining the low-rank dictionary from the original data set according to a first objective function; the first objective function is used for denoising the original data set.
4. The method of claim 1, wherein the weight matrix comprises a first weight matrix and a second weight matrix, and wherein determining the weight matrix corresponding to the original data set according to the original data set comprises:
calculating the first weight matrix according to the original data set; wherein the first weight matrix is used for reducing the representation coefficient;
determining the second weight matrix according to a second objective function and the original data set; the second weight matrix is used for representing the local relation of the data in the original data set in the original space.
5. The method according to claim 1, wherein determining the corresponding representation coefficients of the original data set according to the low rank dictionary and the weight matrix comprises:
obtaining a third objective function according to the first weight matrix; obtaining a Laplace matrix according to the second weight matrix;
converting the third objective function according to the Laplace matrix to obtain a converted third objective function;
and solving the converted third objective function to obtain the representation coefficient.
6. The method of claim 5, wherein solving the transformed third objective function to obtain the representation coefficients comprises:
and carrying out iterative solution on the converted third objective function according to a preset auxiliary variable to obtain the representation coefficient.
7. The method of claim 2, wherein after the dimensionality reduction processing of the raw data set, the method further comprises:
and determining the category parameters corresponding to the original data set.
8. The method according to claim 7, wherein the obtaining a clustering result corresponding to the original data set by using spectral clustering based on the similarity matrix comprises:
calculating to obtain a normalized symmetrical Laplacian matrix corresponding to the original data set according to the similarity matrix;
forming a target matrix according to the category parameters and the normalized symmetrical Laplace matrix;
carrying out normalization processing on the target matrix to obtain a normalized target matrix;
and clustering the normalized target matrix to obtain a clustering result corresponding to the original data set.
9. A data clustering apparatus, characterized in that the data clustering apparatus comprises: a receiving unit, a converting unit, a determining unit, a establishing unit and an acquiring unit,
the receiving unit is used for receiving an original data set;
the conversion unit is used for converting the original data set;
the determining unit is used for determining a low-rank dictionary and a weight matrix corresponding to the original data set according to the original data set; determining a representation coefficient corresponding to the original data set according to the low-rank dictionary and the weight matrix;
the establishing unit is used for establishing a similarity matrix corresponding to the original data set according to the representation coefficient;
and the acquisition unit is used for acquiring a clustering result corresponding to the original data set by utilizing spectral clustering based on the similarity matrix.
10. The data clustering apparatus according to claim 9,
the conversion unit is specifically configured to perform dimensionality reduction processing on the original data set after receiving the original data set.
11. The data clustering apparatus according to claim 9,
the determining unit is specifically configured to determine the low-rank dictionary from the original data set according to a first objective function; the first objective function is used for denoising the original data set;
or, the determining unit is further specifically configured to obtain a third objective function according to the first weight matrix; obtaining a Laplace matrix according to the second weight matrix; converting the third objective function according to the Laplace matrix to obtain a converted third objective function; and solving the converted third objective function to obtain the representation coefficient.
12. The data clustering device of claim 9, wherein the weight matrix comprises a first weight matrix and the second weight matrix,
the determining unit is further specifically configured to calculate the first weight matrix according to the original data set; wherein the first weight matrix is used for reducing the representation coefficient; determining the second weight matrix according to a second objective function and the original data set; the second weight matrix is used for representing the local relation of the data in the original data set in the original space.
13. The data clustering apparatus of claim 11,
the determining unit is further specifically configured to perform iterative solution on the converted third objective function according to a preset auxiliary variable to obtain the representation coefficient.
14. The data clustering apparatus according to claim 10,
the determining unit is further configured to determine a category parameter corresponding to the original data set after performing dimensionality reduction processing on the original data set.
15. The data clustering apparatus of claim 14,
the obtaining unit is specifically configured to obtain a normalized symmetric laplacian matrix corresponding to the original data set according to the similarity matrix; forming a target matrix according to the category parameters and the normalized symmetrical Laplace matrix; carrying out normalization processing on the target matrix to obtain a normalized target matrix; and clustering the normalized target matrix to obtain a clustering result corresponding to the original data set.
16. A data clustering device, comprising a processor, a memory storing instructions executable by the processor, a communication interface, and a bus connecting the processor, the memory, and the communication interface, wherein the instructions, when executed by the processor, implement the method of any one of claims 1 to 8.
17. A computer-readable storage medium, on which a program is stored, for use in a data clustering apparatus, wherein the program, when executed by a processor, implements the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910784526.6A CN112417234B (en) | 2019-08-23 | 2019-08-23 | Data clustering method and device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910784526.6A CN112417234B (en) | 2019-08-23 | 2019-08-23 | Data clustering method and device and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112417234A true CN112417234A (en) | 2021-02-26 |
CN112417234B CN112417234B (en) | 2024-01-26 |
Family
ID=74779690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910784526.6A Active CN112417234B (en) | 2019-08-23 | 2019-08-23 | Data clustering method and device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112417234B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115601232A (en) * | 2022-12-14 | 2023-01-13 | 华东交通大学(Cn) | Color image decoloring method and system based on singular value decomposition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130191425A1 (en) * | 2012-01-20 | 2013-07-25 | Fatih Porikli | Method for Recovering Low-Rank Matrices and Subspaces from Data in High-Dimensional Matrices |
CN106446924A (en) * | 2016-06-23 | 2017-02-22 | 首都师范大学 | Construction of spectral clustering adjacency matrix based on L3CRSC and application thereof |
CN107292258A (en) * | 2017-06-14 | 2017-10-24 | 南京理工大学 | High spectrum image low-rank representation clustering method with filtering is modulated based on bilateral weighted |
-
2019
- 2019-08-23 CN CN201910784526.6A patent/CN112417234B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130191425A1 (en) * | 2012-01-20 | 2013-07-25 | Fatih Porikli | Method for Recovering Low-Rank Matrices and Subspaces from Data in High-Dimensional Matrices |
CN106446924A (en) * | 2016-06-23 | 2017-02-22 | 首都师范大学 | Construction of spectral clustering adjacency matrix based on L3CRSC and application thereof |
CN107292258A (en) * | 2017-06-14 | 2017-10-24 | 南京理工大学 | High spectrum image low-rank representation clustering method with filtering is modulated based on bilateral weighted |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115601232A (en) * | 2022-12-14 | 2023-01-13 | 华东交通大学(Cn) | Color image decoloring method and system based on singular value decomposition |
Also Published As
Publication number | Publication date |
---|---|
CN112417234B (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qiu et al. | Learning transformations for clustering and classification. | |
Zhou et al. | Double shrinking sparse dimension reduction | |
Liu et al. | On the performance of manhattan nonnegative matrix factorization | |
Jin et al. | Low-rank matrix factorization with multiple hypergraph regularizer | |
Cheng et al. | Sparse representation and learning in visual recognition: Theory and applications | |
Han et al. | A generalized model for robust tensor factorization with noise modeling by mixture of Gaussians | |
Shao et al. | Deep linear coding for fast graph clustering | |
Fei et al. | Low rank representation with adaptive distance penalty for semi-supervised subspace classification | |
CN105608478B (en) | image feature extraction and classification combined method and system | |
Qi et al. | Multi-dimensional sparse models | |
CN104462196B (en) | Multiple features combining Hash information search method | |
Hidru et al. | EquiNMF: Graph regularized multiview nonnegative matrix factorization | |
Nguyen et al. | Discriminative low-rank dictionary learning for face recognition | |
CN110717519A (en) | Training, feature extraction and classification method, device and storage medium | |
Zhang et al. | Singular value decomposition based virtual representation for face recognition | |
Xie et al. | Inducing wavelets into random fields via generative boosting | |
Wei et al. | Compact MQDF classifiers using sparse coding for handwritten Chinese character recognition | |
CN112163114A (en) | Image retrieval method based on feature fusion | |
Wang et al. | Modal regression based greedy algorithm for robust sparse signal recovery, clustering and classification | |
Abrol et al. | A geometric approach to archetypal analysis via sparse projections | |
Yao et al. | Principal component dictionary-based patch grouping for image denoising | |
Abdi et al. | Dictionary learning enhancement framework: Learning a non-linear mapping model to enhance discriminative dictionary learning methods | |
CN112417234B (en) | Data clustering method and device and computer readable storage medium | |
Benuwa et al. | Kernel based locality–sensitive discriminative sparse representation for face recognition | |
Hu et al. | Robust sequential subspace clustering via ℓ1-norm temporal graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |