CN110674848A - High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation - Google Patents
High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation Download PDFInfo
- Publication number
- CN110674848A CN110674848A CN201910819539.2A CN201910819539A CN110674848A CN 110674848 A CN110674848 A CN 110674848A CN 201910819539 A CN201910819539 A CN 201910819539A CN 110674848 A CN110674848 A CN 110674848A
- Authority
- CN
- China
- Prior art keywords
- clustering
- bipartite graph
- dimensional data
- matrix
- sparse representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation, which comprises the following steps of: partitioning the high-dimensional dataset into non-overlapping data subsets using spatial proximity; constructing a structure dictionary and calculating the correlation between the structure dictionary and each data node; defining a bipartite graph of high-dimensional data joint clustering; constructing an adjacency matrix of the bipartite graph; constructing a bipartite graph segmentation and optimization model; standardizing the adjacency matrix and calculating the left and right eigenvectors of the adjacency matrix; and performing joint clustering on the left and right feature vectors by using a K-means algorithm to obtain a final clustering label. According to the method, the non-overlapped local neighborhood subsets and the structural dictionary learning under the combined sparse representation constraint are adopted to simultaneously mine the local sparsity and the self-correlation property in the high-dimensional data, and the clustering precision can be effectively improved and the robustness to noise is enhanced by simultaneously utilizing the left and right characteristic vectors of the adjacent matrix and simultaneously clustering the high-dimensional data nodes and the structural dictionary atoms.
Description
Technical Field
The invention belongs to the technical field of high-dimensional data processing, and particularly relates to a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation.
Background
With the rapid development of information technology, data obtained by various industries are rapidly increasing in an exponential mode, so that the data is larger and larger in scale and higher in complexity, a large amount of data reflects the characteristics of high-dimensional data, the dimensionality of the data can reach hundreds of thousands of dimensions, and even higher, and the hyperspectral remote sensing data is a typical representative. The hyperspectral remote sensing technology utilizes an imaging spectrometer, can acquire continuous and narrow-band image data with nanoscale spectral resolution in the range from visible light to short wave infrared and even middle infrared and thermal infrared bands and hundreds of hyperspectral resolutions in spectral bands, contains abundant spatial information and spectral information, and can acquire cubic hyperspectral remote sensing data by superposing the two, so that the hyperspectral remote sensing data has the property of map integration, and is widely applied to the fields of military reconnaissance, environmental monitoring, geological exploration, crop evaluation, disaster early warning and the like. The hyperspectral image is used as an image cube integrating the maps, and the core of quantitative analysis is spectral analysis. In the application of hyperspectral data, classification is one of the important tasks for hyperspectral data understanding. The hyperspectral images are classified into two methods, namely supervision and unsupervised, according to whether sample label information is contained or not. Generally, it is difficult to obtain a large number of labeled training samples, so the unsupervised classification or clustering method has wider application value in the field. Meanwhile, unsupervised classification is also an important way for realizing hyperspectral quantitative analysis, the development trend of the method is that spectrum information-level unsupervised classification is developed to a space-spectrum context unsupervised classification method, and a structured sparse representation learning mechanism is introduced to deeply excavate hyperspectral image space-spectrum context structure information so as to obtain a sub-pixel-level fine classification result.
For unsupervised classification, the existing high-dimensional data clustering algorithm mainly comprises a center-based clustering method, a distribution-based clustering method, a density-based clustering method, a connection-based clustering method and the like. However, the existing clustering method is simple, lacks the capability of processing complex structure data, and when the sample space is not convex, the algorithm is easy to fall into local optimization, and the performance is not good enough on high-dimensional data processing. The subspace-based clustering algorithm is based on spectrogram theory, and has attracted more and more attention in academia in recent years. Compared with the traditional clustering algorithm, the clustering algorithm based on the subspace has the advantages that clustering can be performed on sample spaces in any shapes and the overall optimal solution can be converged, many practical problems can be effectively handled, and the clustering algorithm has great scientific research value and application prospect. Spectral clustering, which is one of subspace clustering, is a graph segmentation technique, and generally converts a high-dimensional data clustering problem into a hypergraph segmentation optimization problem. In other words, spectral clustering is the partitioning of a weighted graph into disjoint subsets such that the sum of the weights of the edges connecting the disjoint subsets is minimized. However, spectral clustering only uses row information of the adjacency matrix and applies it to the intersected set, which tends to discard part of the information, and thus the overall accuracy of the method is degraded. Furthermore, spectral clustering generally deals with the clustering problem of high dimensional data unidirectionally, i.e., clustering only dictionary atoms or clustering sparse representation coefficients.
Disclosure of Invention
The invention discloses a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation, which can deeply excavate context structure information of high-dimensional data by introducing a structured sparse representation learning mechanism and obtain a sub-pixel-level fine classification result.
The technical solution for realizing the purpose of the invention is as follows: a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation comprises the following steps:
the method comprises the following steps that firstly, a high-dimensional data set is divided into non-overlapping data subsets by utilizing spatial proximity;
secondly, constructing a structure dictionary and calculating the correlation between the structure dictionary and each data node, namely learning a joint sparse representation coefficient and the structure dictionary in combination with the context characteristics of high-dimensional data in an optimization model of joint sparse representation constraint;
step three, defining a bipartite graph of high-dimensional data joint clustering, namely defining an undirected bipartite graph which comprises two disjoint vertex sets;
fourthly, constructing an adjacent matrix of the bipartite graph, namely mapping the joint sparse representation coefficient to a non-negative adjacent matrix of the graph;
fifthly, constructing a bipartite graph segmentation and optimization model, namely constructing a bipartite graph optimization model by using an adjacency matrix and using a normalized segmentation optimization model;
sixthly, standardizing the adjacent matrix and calculating left and right eigenvectors of the adjacent matrix, namely calculating the standardized matrix of the adjacent matrix, and decomposing the normalized matrix by using singular value decomposition to obtain the left and right eigenvectors;
and seventhly, performing combined clustering on the left and right feature vectors by using a K-means algorithm to obtain a final clustering label.
Furthermore, in the first step, the high-dimensional data set is divided into non-overlapping data subsets by using spatial proximity, that is, the high-dimensional data set is divided into a plurality of non-overlapping w × w square spatial neighborhood subsets, wherein w is more than or equal to 3.
Further, the second step constructs a structure dictionary and calculates the correlation between the structure dictionary and each data node, and the specific process is as follows:
(1) for ith data node yiI is more than or equal to 1 and less than or equal to n, n is the number of samples, and the node y is intercepted from the high-dimensional dataiIs denoted as ΓiExpressed as follows:
(2) the structural dictionary learning model is represented as follows:
wherein X ═ X1,…,xi,…,xn]∈Rm×nIs a joint sparse representation coefficient, D ═ D1,…,di,…,dm]∈Rd ×mIs a dictionary of the structure of the text,is a2/l1Norm, which representsLine of2The sum of the norms is then calculated,is a regularization parameter in which the joint sparse representation coefficient X provides a correlation between the high dimensional data node and the dictionary atoms.
Further, defining a bipartite graph of high-dimensional data joint clustering in the third step, wherein the specific process is as follows:
defining an undirected bipartite graphIt consists of two disjoint sets of vertices, whereIs a collection of dictionary atoms that is,is a high-dimensional data node set connected by corresponding edge sets E, and the set E represents all edge weights EijIn which EijIs the weight of the edge between the ith vertex and the jth vertex in the bipartite graph, and the edge EijOnly between two heterogeneous vertex sets.
Further, a fourth step constructs a adjacency matrix of the bipartite graph, i.e. by mapping the joint sparse coefficients to a non-negative adjacency matrix of the graph, in particularWherein a ═ X |.
Further, a fifth step of constructing a bipartite graph segmentation and optimization model, which comprises the following specific processes:
(1) divided into two clusters, assumingIs a division of the bipartite graph, and adopts a normalized cut for dividing the bipartite graph, wherein the normalized cut can be written as:
whereinAndrepresents the accumulated edge weights between the clusters,representing the accumulated edge weights within a cluster;
(2) let q be the vector of the segmented bipartite graph G if q 11, otherwise q2-1; the rayleigh quotient of the vector q is equivalent to the segmentation optimization model in step (1), and specifically comprises the following steps:
the above formula is equivalent to:
whereinAll elements of vector e are equal to 1;
(3) the discrete segmentation vector q in the segmentation optimization model can be relaxed in a continuous vector form, and specifically comprises the following steps:
the above solving problem corresponds to a generalized eigenvalue problemIs determined, wherein z is the feature vector.
Further, the sixth step of normalizing the adjacency matrix and calculating the left and right eigenvectors thereof comprises the following specific processes:
(1) in the bipartite drawing
Wherein D1(i,i)=∑jAijAnd D2(j,j)=∑iAijIs a diagonal matrix;
The above formula is equivalent to:
D1z1-Az2=λD1z1
-ATz1+D2z2=λD2z2
Further, the seventh step uses the K-means algorithm to pair vectorsAnd clustering to obtain a final clustering label.
Compared with the prior art, the invention has the remarkable characteristics that: (1) the invention adopts non-overlapping spatial neighborhood subsets, nodes in the neighborhood subsets are usually positioned in a low-dimensional subspace and are usually formed by dictionary atoms of the same class, and the identifiability between high-dimensional data classes is obtained; (2) the dictionary learning method under the combined sparse representation constraint optimization framework is used for capturing inherent local sparsity and non-local self-similarity of high-dimensional data, and the calculation complexity and parameter setting process are reduced; (3) and (3) capturing row and column information of the adjacency matrix through bipartite graph segmentation of the high-dimensional data and capturing correlation between the high-dimensional data nodes and dictionary atoms.
Drawings
FIG. 1 is a flow chart of the high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation.
FIG. 2 is a schematic diagram of data subsets that segment high-dimensional data sets into non-overlapping.
FIG. 3 is a bipartite graph between high-dimensional data nodes and dictionary atoms.
FIG. 4(a) is a plot of the actual terrain profile for the Indian Pines dataset.
FIG. 4(b) is a graph of the clustering effect of Indian Pines data sets using the K-means method.
FIG. 4(c) is a diagram of the clustering effect of Indian Pines data set using the CFSFDP method.
FIG. 4(d) is a clustering effect diagram of a high-dimensional data clustering method of spectral dictionary learning and spectral clustering for Indian Pines data sets.
FIG. 4(e) is a clustering effect diagram of a high-dimensional data clustering method of Indian Pines data set using spatial dictionary learning and spectral clustering.
FIG. 4(f) is a graph of the clustering effect of the high-dimensional data clustering method in which the Indian Pines data set employs joint sparse representation and spectral clustering.
FIG. 4(g) is a clustering effect diagram of a high-dimensional data joint clustering method of Indian Pines data sets by spectral dictionary learning and bipartite graph segmentation.
FIG. 4(h) is a clustering effect diagram of a high-dimensional data joint clustering method of Indian Pines data set by using spatial dictionary learning and bipartite graph segmentation.
FIG. 4(i) is a graph of the clustering effect of Indian Pines datasets using the method of the present invention.
FIG. 5(a) is a plot of the true terrain profile for the Pavia University dataset.
FIG. 5(b) is a graph of the clustering effect of the Pavia University dataset by the K-means method.
FIG. 5(c) is a graph of the clustering effect of the Pavia University data set using the CFSFDP method.
FIG. 5(d) is a clustering effect diagram of the high-dimensional data clustering method of the Pavia University data set using spectral dictionary learning and spectral clustering.
FIG. 5(e) is a clustering effect diagram of a high-dimensional data clustering method of Pavia University data set employing spatial dictionary learning and spectral clustering.
FIG. 5(f) is a clustering effect diagram of a high-dimensional data clustering method in which the Pavia University data set adopts joint sparse representation and spectral clustering.
FIG. 5(g) is a clustering effect diagram of a high-dimensional data joint clustering method of the Pavia University data set by using spectral dictionary learning and bipartite graph segmentation.
FIG. 5(h) is a clustering effect diagram of a high-dimensional data joint clustering method of Pavia University data set by using spatial dictionary learning and bipartite graph segmentation.
FIG. 5(i) is a graph of the clustering effect of the Pavia University dataset using the method of the present invention.
Detailed Description
The invention provides a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation, which adopts non-overlapped local neighborhood subsets and structure dictionary learning under joint sparse representation constraint to simultaneously mine local sparsity and autocorrelation properties in high-dimensional data, and reduces the calculation complexity and parameter setting process; and the row and column information of the adjacency matrix and the correlation between the high-dimensional data nodes and the dictionary atoms are captured by dividing the high-dimensional data nodes through the bipartite graph. The specific steps of the present invention are described in detail with reference to fig. 1:
first, a high-dimensional data set is divided into non-overlapping data subsets by using spatial proximity, and a hyperspectral image Y is input by taking the image shown in fig. 4(a) as an example1,…,yi,…,yn]∈Rd×nAnd d is 200, and n is 21025, and the hyperspectral image element is divided into a plurality of non-overlapping 5 × 5 square space neighborhood subsets. The specific division is shown in fig. 2.
And secondly, constructing a structural dictionary, wherein the specific process is as follows:
(1) for the ith pixel y (i is more than or equal to 1 and less than or equal to 21025)iIntercepting included pixel y in hyperspectral imageiIs denoted as Γ, and is a 5 × 5 neighborhood subset ofiExpressed as follows:
(2) The structural dictionary learning model is represented as follows:
wherein X ═ X1,…,xi,…,xn]∈Rm×nIs a joint sparse representation coefficient, Y is a hyperspectral image, D ═ D1,…,di,…,dm]∈Rd×mIs a dictionary of the structure of the text,is a2/l1Norm, which representsLine of2The sum of the norms is then calculated,is a regularization parameter where the joint sparse representation coefficient X provides a correlation between the hyperspectral image pixels and the dictionary atoms.
Thirdly, defining a bipartite graph of the hyperspectral image cluster, and the specific process is as follows:
defining an undirected bipartite graphIt consists of two disjoint sets of vertices, whereIs a collection of dictionary atoms that is,is a hyperspectral image pixel set, connected by corresponding edge sets E, the set E represents all edge weights EijIn which EijIs the weight of the edge between the ith vertex and the jth vertex in the bipartite graph, and the edge EijExist only between two heterogeneous sets of vertices (vertices within the set do not communicate), as shown in fig. 3.
Fourth, a adjacency matrix of the bipartite graph is constructed, i.e. by mapping the joint sparse coefficients to a non-negative adjacency matrix of the graph, in particularWherein a ═ X |.
And fifthly, constructing a bipartite graph segmentation and optimization model, wherein the concrete process is as follows:
(1) for simplicity, consider a division into two clusters, assumingThe method is a partition of a bipartite graph, in order to better divide a sample into two clusters and balance the size of each cluster, a normalized partition for partitioning the bipartite graph is adopted, and the normalized partition can be written as:
whereinAndrepresents the accumulated edge weights between the clusters,representing the accumulated edge weights within a cluster.
(2) Let q be the vector of the segmented bipartite graph G if q 11, otherwise q2Is-1. The rayleigh quotient of the vector q is equivalent to the segmentation optimization model in the previous step, and specifically includes:
The above formula is equivalent to:
(3) The discrete segmentation vector q in the segmentation optimization model in the previous step can be relaxed in a continuous vector form, specifically:
the above solving problem corresponds to a generalized eigenvalue problemIs determined, wherein z is the feature vector.
Sixthly, standardizing the adjacency matrix and calculating the left and right eigenvectors thereof, specifically:
(1) in the bipartite drawing
Wherein D1(i,i)=∑jAijAnd D2(j,j)=∑iAijIs a diagonal matrix.
The above formula is equivalent to:
thus, the above equation is equivalent to a normalized matrixSingular value decomposition of (c).
Seventhly, performing combined clustering on the left and right feature vectors by using a K-mean algorithm to obtain a final clustering label, namely using the K-mean algorithm to perform vector matchingAnd clustering to obtain a final clustering label.
The method efficiently utilizes the joint sparsity in the data, integrates the joint representation characteristics of representation dictionary atoms and coefficients, overcomes the defect that the traditional sparsity clustering only utilizes the representation coefficients, improves the clustering precision, and enhances the robustness to noise. The method can be widely applied to the unsupervised classification of high-dimensional data in the fields of homeland resources, mineral survey and precision agriculture.
The invention is further described in detail below with reference to examples of hyperspectral image clustering and the accompanying drawings.
Examples
(1) Simulation conditions
The simulation experiment adopts two groups of real hyperspectral data: indian Pines dataset and Pavia University dataset. The Indian Pines dataset is a hyperspectral remote sensing image acquired by an airborne visible infrared imaging spectrometer (AVIRIS) in an Indian Pines experimental area, indiana. The image contains 220 bands in total, the spatial resolution is 20m, and the image size is 145 × 145. After removing 20 water vapor absorption and low signal-to-noise ratio bands (band numbers 104-. The region contains 16 known land features in total, and 8 land features are selected as experiments in order to balance the balance among the land features. The Pavia University dataset was acquired by a ross sensor in parkia, and included 115 bands in total, with an image size of 610 × 340, and after removing the noise band, the remaining 103 bands were selected as the study objects. Considering the problem of computational complexity, the invention selects a sub-graph with the size of 200 × 100. The simulation experiments are all completed by adopting MATLAB R2014a under a Windows 7 operating system.
The evaluation indexes adopted by the invention are an evaluation method of clustering accuracy (ACC, clustering method of clustering accuracy), adjusting Lande Index (ARI, Adjusted Rand Index), adjusting Mutual Information (AMI, Adjusted Mutual Information), normalizing Mutual Information (NMI, Normalized Mutual Information), Homogeneity (Homogeneity), integrity (completeness), harmonic mean (V-measure) and Fowles-Mallows Index (FMI, Fokes-Mallows Index).
(2) Emulated content
The invention adopts the clustering performance of a real hyperspectral data set inspection algorithm. In order to test the performance of the algorithm, the proposed high-dimensional data joint clustering method (BGP-JSDL) for joint sparse representation and bipartite graph segmentation is compared with the current internationally popular clustering algorithm. The comparison method comprises the following steps: k-means, CFSFDP, a high-dimensional data clustering method of spectral dictionary learning and spectral clustering (SC-SDL), a high-dimensional data clustering method of spatial dictionary learning and spectral clustering (SC-CDL), a high-dimensional data clustering method of joint sparse representation and spectral clustering (SC-JSDL), a high-dimensional data joint clustering method of spectral dictionary learning and bipartite graph segmentation (BGP-SDL), a high-dimensional data joint clustering method of spatial dictionary learning and bipartite graph segmentation (BGP-CDL), and a high-dimensional data joint clustering method of joint sparse representation and bipartite graph segmentation (BGP-JSDL).
(3) Analysis of simulation experiment results
Tables 1 and 2 show the clustering precision and the comparison result of different evaluation indexes of two groups of hyperspectral data sets under different clustering algorithms.
TABLE 1 quantitative evaluation of different clustering algorithms for Indian Pines datasets (ACC, ARI, AMI, NMI, homogeneity, completeness, V _ means, FMI (%))
TABLE 2 quantitative evaluation of different clustering algorithms for the Pavia University dataset (ACC, ARI, AMI, NMI, homogeneity, completeness, V _ means, FMI (%))
As can be seen from table 1, in the Indian Pines dataset, the JSDL significantly improves the clustering accuracy in different evaluation indexes by virtue of the inherent local sparsity of the captured hyperspectral image and the discriminativity of the dictionary, compared with the SDL and the CDL. The high-dimensional data clustering based on bipartite graph segmentation is characterized in that the correlation between the pixels of the hyperspectral images and the atoms of the dictionary is captured, and compared with a high-dimensional data clustering method based on spectral clustering, the clustering precision is remarkably improved. As can be seen from Table 2, the same conclusions can be drawn on the Paviauniversity dataset. The result effect graphs of the method of the invention on two sets of data sets are shown in fig. 4 and fig. 5. The simulation experiment results of the two groups of real data sets show the effectiveness of the method.
Claims (8)
1. A high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation is characterized by comprising the following steps:
the method comprises the following steps that firstly, a high-dimensional data set is divided into non-overlapping data subsets by utilizing spatial proximity;
secondly, constructing a structure dictionary and calculating the correlation between the structure dictionary and each data node, namely learning a joint sparse representation coefficient and the structure dictionary in combination with the context characteristics of high-dimensional data in an optimization model of joint sparse representation constraint;
step three, defining a bipartite graph of high-dimensional data joint clustering, namely defining an undirected bipartite graph comprising two disjoint vertex sets;
fourthly, constructing an adjacent matrix of the bipartite graph, namely mapping the joint sparse representation coefficient to a non-negative adjacent matrix of the graph;
fifthly, constructing a bipartite graph segmentation and optimization model, namely constructing a bipartite graph optimization model by using an adjacency matrix and using a normalized segmentation optimization model;
sixthly, standardizing the adjacent matrix and calculating left and right eigenvectors of the adjacent matrix, namely calculating the standardized matrix of the adjacent matrix, and decomposing the normalized matrix by using singular value decomposition to obtain the left and right eigenvectors;
and seventhly, performing combined clustering on the left and right feature vectors by using a K-means algorithm to obtain a final clustering label.
2. The method of claim 1, wherein the first step of using spatial proximity to partition the high-dimensional data set into non-overlapping subsets of data is to divide the high-dimensional data set into non-overlapping subsets of w x w square spatial neighbors, where w is greater than or equal to 3.
3. The method for jointly clustering high-dimensional data by jointly representing sparse data and segmenting bipartite graph according to claim 1, wherein the second step constructs a structural dictionary and calculates the correlation between the structural dictionary and each data node, and the specific process is as follows:
(1) for ith data node yiI is more than or equal to 1 and less than or equal to n, n is the number of samples, and the node y is intercepted from the high-dimensional dataiIs denoted as ΓiExpressed as follows:whereinRepresenting a spatial neighborhood subset ΓiOfiL number of data nodes;
(2) the structural dictionary learning model is represented as follows:
wherein X ═ X1,…,xi,…,xn]∈Rm×nIs a joint sparse representation coefficient, D ═ D1,…,di,…,dm]∈Rd×mIs a dictionary of the structure of the text,is a2/l1Norm, which representsLine of2The sum of the norms is then calculated,is a regularization parameter in which the joint sparse representation coefficient X provides a correlation between the high dimensional data node and the dictionary atoms.
4. The method for jointly clustering high-dimensional data by jointly sparse representation and bipartite graph segmentation according to claim 1, wherein the third step defines a bipartite graph for jointly clustering high-dimensional data by:
defining an undirected bipartite graphIt consists of two disjoint sets of vertices, whereIs a collection of dictionary atoms that is,is a high-dimensional data node set connected by corresponding edge sets E, and the set E represents all edge weights EijIn which EijIs the weight of the edge between the ith vertex and the jth vertex in the bipartite graph, and the edge EijOnly between two heterogeneous vertex sets.
5. Method for jointly clustering high-dimensional data by joint sparse representation and bipartite graph partitioning according to claim 1, wherein the fourth step constructs a adjacency matrix of the bipartite graph by mapping joint sparse coefficients to non-negative adjacency matrices of the graph, in particular to non-negative adjacency matrices of the graphWherein a ═ X |.
6. The method for jointly clustering sparse representation and bipartite graph segmented high-dimensional data according to claim 1, wherein the fifth step is constructing a bipartite graph segmentation and optimization model by the following specific processes:
(1) divided into two clusters, assumingIs a division of the bipartite graph, and adopts a normalized cut for dividing the bipartite graph, wherein the normalized cut can be written as:
whereinAnd represents the accumulated edge weights between the clusters,representing the accumulated edge weights within a cluster;
(2) let q be the vector of the segmented bipartite graph G ifq11, otherwise q2-1; the rayleigh quotient of the vector q is equivalent to the segmentation optimization model in step (1), and specifically comprises the following steps:
the above formula is equivalent to:
(3) the discrete segmentation vector q in the segmentation optimization model is relaxed in a continuous vector form, and the method specifically comprises the following steps:
7. The method for jointly clustering sparse representations and bipartite graph segmented high-dimensional data according to claim 1, wherein the sixth step normalizes the adjacency matrix and calculates its left and right eigenvectors by:
(1) in the bipartite drawing
Wherein D1(i,i)=∑jAijAnd D2(j,j)=∑iAijIs a diagonal matrix;
The above formula is equivalent to:
D1z1-Az2=λD1z1
-ATz1+D2z2=λD2z2
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910819539.2A CN110674848A (en) | 2019-08-31 | 2019-08-31 | High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910819539.2A CN110674848A (en) | 2019-08-31 | 2019-08-31 | High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110674848A true CN110674848A (en) | 2020-01-10 |
Family
ID=69076109
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910819539.2A Withdrawn CN110674848A (en) | 2019-08-31 | 2019-08-31 | High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110674848A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112768029A (en) * | 2020-12-27 | 2021-05-07 | 上海市东方医院(同济大学附属东方医院) | Combined medication recommendation device, method and medium based on single cell sequencing |
WO2021164382A1 (en) * | 2020-02-17 | 2021-08-26 | 支付宝(杭州)信息技术有限公司 | Method and apparatus for performing feature processing for user classification model |
CN115374191A (en) * | 2022-10-26 | 2022-11-22 | 国网湖北省电力有限公司信息通信公司 | Multi-source data-driven cluster method for heterogeneous equipment of data center |
-
2019
- 2019-08-31 CN CN201910819539.2A patent/CN110674848A/en not_active Withdrawn
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021164382A1 (en) * | 2020-02-17 | 2021-08-26 | 支付宝(杭州)信息技术有限公司 | Method and apparatus for performing feature processing for user classification model |
CN112768029A (en) * | 2020-12-27 | 2021-05-07 | 上海市东方医院(同济大学附属东方医院) | Combined medication recommendation device, method and medium based on single cell sequencing |
CN112768029B (en) * | 2020-12-27 | 2023-10-13 | 上海市东方医院(同济大学附属东方医院) | Combined drug recommendation equipment, method and medium based on single cell sequencing |
CN115374191A (en) * | 2022-10-26 | 2022-11-22 | 国网湖北省电力有限公司信息通信公司 | Multi-source data-driven cluster method for heterogeneous equipment of data center |
CN115374191B (en) * | 2022-10-26 | 2023-01-31 | 国网湖北省电力有限公司信息通信公司 | Multi-source data-driven cluster method for heterogeneous equipment of data center |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321963B (en) | Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features | |
CN111860612B (en) | Unsupervised hyperspectral image hidden low-rank projection learning feature extraction method | |
Fu et al. | Hyperspectral anomaly detection via deep plug-and-play denoising CNN regularization | |
CN110084159B (en) | Hyperspectral image classification method based on combined multistage spatial spectrum information CNN | |
CN110399909B (en) | Hyperspectral image classification method based on label constraint elastic network graph model | |
CN108460342B (en) | Hyperspectral image classification method based on convolutional neural network and cyclic neural network | |
Du et al. | A spectral-spatial based local summation anomaly detection method for hyperspectral images | |
CN110348399B (en) | Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network | |
CN107451614B (en) | Hyperspectral classification method based on fusion of space coordinates and space spectrum features | |
CN107992891B (en) | Multispectral remote sensing image change detection method based on spectral vector analysis | |
CN108197650B (en) | Hyperspectral image extreme learning machine clustering method with local similarity maintained | |
CN109376753B (en) | Probability calculation method for three-dimensional spatial spectrum space dimension pixel generic | |
Liu et al. | Enhancing spectral unmixing by local neighborhood weights | |
Ortac et al. | Comparative study of hyperspectral image classification by multidimensional Convolutional Neural Network approaches to improve accuracy | |
CN103208011B (en) | Based on average drifting and the hyperspectral image space-spectral domain classification method organizing sparse coding | |
CN105989336B (en) | Scene recognition method based on deconvolution deep network learning with weight | |
CN112308152B (en) | Hyperspectral image ground object classification method based on spectrum segmentation and homogeneous region detection | |
CN110674848A (en) | High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation | |
CN107292258B (en) | High-spectral image low-rank representation clustering method based on bilateral weighted modulation and filtering | |
Ma et al. | Hyperspectral anomaly detection based on low-rank representation with data-driven projection and dictionary construction | |
Paul et al. | Dimensionality reduction using band correlation and variance measure from discrete wavelet transformed hyperspectral imagery | |
CN112381144B (en) | Heterogeneous deep network method for non-European and Euclidean domain space spectrum feature learning | |
CN112052758B (en) | Hyperspectral image classification method based on attention mechanism and cyclic neural network | |
CN114937173A (en) | Hyperspectral image rapid classification method based on dynamic graph convolution network | |
Ren et al. | PolSAR feature extraction via tensor embedding framework for land cover classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200110 |