CN110674848A - High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation - Google Patents

High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation Download PDF

Info

Publication number
CN110674848A
CN110674848A CN201910819539.2A CN201910819539A CN110674848A CN 110674848 A CN110674848 A CN 110674848A CN 201910819539 A CN201910819539 A CN 201910819539A CN 110674848 A CN110674848 A CN 110674848A
Authority
CN
China
Prior art keywords
clustering
bipartite graph
dimensional data
matrix
sparse representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910819539.2A
Other languages
Chinese (zh)
Inventor
肖亮
黄楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN201910819539.2A priority Critical patent/CN110674848A/en
Publication of CN110674848A publication Critical patent/CN110674848A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation, which comprises the following steps of: partitioning the high-dimensional dataset into non-overlapping data subsets using spatial proximity; constructing a structure dictionary and calculating the correlation between the structure dictionary and each data node; defining a bipartite graph of high-dimensional data joint clustering; constructing an adjacency matrix of the bipartite graph; constructing a bipartite graph segmentation and optimization model; standardizing the adjacency matrix and calculating the left and right eigenvectors of the adjacency matrix; and performing joint clustering on the left and right feature vectors by using a K-means algorithm to obtain a final clustering label. According to the method, the non-overlapped local neighborhood subsets and the structural dictionary learning under the combined sparse representation constraint are adopted to simultaneously mine the local sparsity and the self-correlation property in the high-dimensional data, and the clustering precision can be effectively improved and the robustness to noise is enhanced by simultaneously utilizing the left and right characteristic vectors of the adjacent matrix and simultaneously clustering the high-dimensional data nodes and the structural dictionary atoms.

Description

High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation
Technical Field
The invention belongs to the technical field of high-dimensional data processing, and particularly relates to a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation.
Background
With the rapid development of information technology, data obtained by various industries are rapidly increasing in an exponential mode, so that the data is larger and larger in scale and higher in complexity, a large amount of data reflects the characteristics of high-dimensional data, the dimensionality of the data can reach hundreds of thousands of dimensions, and even higher, and the hyperspectral remote sensing data is a typical representative. The hyperspectral remote sensing technology utilizes an imaging spectrometer, can acquire continuous and narrow-band image data with nanoscale spectral resolution in the range from visible light to short wave infrared and even middle infrared and thermal infrared bands and hundreds of hyperspectral resolutions in spectral bands, contains abundant spatial information and spectral information, and can acquire cubic hyperspectral remote sensing data by superposing the two, so that the hyperspectral remote sensing data has the property of map integration, and is widely applied to the fields of military reconnaissance, environmental monitoring, geological exploration, crop evaluation, disaster early warning and the like. The hyperspectral image is used as an image cube integrating the maps, and the core of quantitative analysis is spectral analysis. In the application of hyperspectral data, classification is one of the important tasks for hyperspectral data understanding. The hyperspectral images are classified into two methods, namely supervision and unsupervised, according to whether sample label information is contained or not. Generally, it is difficult to obtain a large number of labeled training samples, so the unsupervised classification or clustering method has wider application value in the field. Meanwhile, unsupervised classification is also an important way for realizing hyperspectral quantitative analysis, the development trend of the method is that spectrum information-level unsupervised classification is developed to a space-spectrum context unsupervised classification method, and a structured sparse representation learning mechanism is introduced to deeply excavate hyperspectral image space-spectrum context structure information so as to obtain a sub-pixel-level fine classification result.
For unsupervised classification, the existing high-dimensional data clustering algorithm mainly comprises a center-based clustering method, a distribution-based clustering method, a density-based clustering method, a connection-based clustering method and the like. However, the existing clustering method is simple, lacks the capability of processing complex structure data, and when the sample space is not convex, the algorithm is easy to fall into local optimization, and the performance is not good enough on high-dimensional data processing. The subspace-based clustering algorithm is based on spectrogram theory, and has attracted more and more attention in academia in recent years. Compared with the traditional clustering algorithm, the clustering algorithm based on the subspace has the advantages that clustering can be performed on sample spaces in any shapes and the overall optimal solution can be converged, many practical problems can be effectively handled, and the clustering algorithm has great scientific research value and application prospect. Spectral clustering, which is one of subspace clustering, is a graph segmentation technique, and generally converts a high-dimensional data clustering problem into a hypergraph segmentation optimization problem. In other words, spectral clustering is the partitioning of a weighted graph into disjoint subsets such that the sum of the weights of the edges connecting the disjoint subsets is minimized. However, spectral clustering only uses row information of the adjacency matrix and applies it to the intersected set, which tends to discard part of the information, and thus the overall accuracy of the method is degraded. Furthermore, spectral clustering generally deals with the clustering problem of high dimensional data unidirectionally, i.e., clustering only dictionary atoms or clustering sparse representation coefficients.
Disclosure of Invention
The invention discloses a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation, which can deeply excavate context structure information of high-dimensional data by introducing a structured sparse representation learning mechanism and obtain a sub-pixel-level fine classification result.
The technical solution for realizing the purpose of the invention is as follows: a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation comprises the following steps:
the method comprises the following steps that firstly, a high-dimensional data set is divided into non-overlapping data subsets by utilizing spatial proximity;
secondly, constructing a structure dictionary and calculating the correlation between the structure dictionary and each data node, namely learning a joint sparse representation coefficient and the structure dictionary in combination with the context characteristics of high-dimensional data in an optimization model of joint sparse representation constraint;
step three, defining a bipartite graph of high-dimensional data joint clustering, namely defining an undirected bipartite graph which comprises two disjoint vertex sets;
fourthly, constructing an adjacent matrix of the bipartite graph, namely mapping the joint sparse representation coefficient to a non-negative adjacent matrix of the graph;
fifthly, constructing a bipartite graph segmentation and optimization model, namely constructing a bipartite graph optimization model by using an adjacency matrix and using a normalized segmentation optimization model;
sixthly, standardizing the adjacent matrix and calculating left and right eigenvectors of the adjacent matrix, namely calculating the standardized matrix of the adjacent matrix, and decomposing the normalized matrix by using singular value decomposition to obtain the left and right eigenvectors;
and seventhly, performing combined clustering on the left and right feature vectors by using a K-means algorithm to obtain a final clustering label.
Furthermore, in the first step, the high-dimensional data set is divided into non-overlapping data subsets by using spatial proximity, that is, the high-dimensional data set is divided into a plurality of non-overlapping w × w square spatial neighborhood subsets, wherein w is more than or equal to 3.
Further, the second step constructs a structure dictionary and calculates the correlation between the structure dictionary and each data node, and the specific process is as follows:
(1) for ith data node yiI is more than or equal to 1 and less than or equal to n, n is the number of samples, and the node y is intercepted from the high-dimensional dataiIs denoted as ΓiExpressed as follows:
Figure BDA0002187147880000031
wherein
Figure BDA0002187147880000032
Representing a spatial neighborhood subset ΓiOfiL number of data nodes;
(2) the structural dictionary learning model is represented as follows:
Figure BDA0002187147880000033
wherein X ═ X1,…,xi,…,xn]∈Rm×nIs a joint sparse representation coefficient, D ═ D1,…,di,…,dm]∈Rd ×mIs a dictionary of the structure of the text,is a2/l1Norm, which represents
Figure BDA0002187147880000035
Line of2The sum of the norms is then calculated,
Figure BDA0002187147880000036
is a regularization parameter in which the joint sparse representation coefficient X provides a correlation between the high dimensional data node and the dictionary atoms.
Further, defining a bipartite graph of high-dimensional data joint clustering in the third step, wherein the specific process is as follows:
defining an undirected bipartite graph
Figure BDA0002187147880000037
It consists of two disjoint sets of vertices, where
Figure BDA0002187147880000038
Is a collection of dictionary atoms that is,
Figure BDA00021871478800000316
is a high-dimensional data node set connected by corresponding edge sets E, and the set E represents all edge weights EijIn which EijIs the weight of the edge between the ith vertex and the jth vertex in the bipartite graph, and the edge EijOnly between two heterogeneous vertex sets.
Further, a fourth step constructs a adjacency matrix of the bipartite graph, i.e. by mapping the joint sparse coefficients to a non-negative adjacency matrix of the graph, in particular
Figure BDA0002187147880000039
Wherein a ═ X |.
Further, a fifth step of constructing a bipartite graph segmentation and optimization model, which comprises the following specific processes:
(1) divided into two clusters, assuming
Figure BDA00021871478800000310
Is a division of the bipartite graph, and adopts a normalized cut for dividing the bipartite graph, wherein the normalized cut can be written as:
Figure BDA00021871478800000311
wherein
Figure BDA00021871478800000312
Andrepresents the accumulated edge weights between the clusters,
Figure BDA00021871478800000314
representing the accumulated edge weights within a cluster;
(2) let q be the vector of the segmented bipartite graph G if
Figure BDA00021871478800000315
q 11, otherwise q2-1; the rayleigh quotient of the vector q is equivalent to the segmentation optimization model in step (1), and specifically comprises the following steps:
wherein
Figure BDA0002187147880000042
Is a matrix of laplacian data to be encoded,
Figure BDA0002187147880000043
is a diagonal matrix;
the above formula is equivalent to:
whereinAll elements of vector e are equal to 1;
(3) the discrete segmentation vector q in the segmentation optimization model can be relaxed in a continuous vector form, and specifically comprises the following steps:
the above solving problem corresponds to a generalized eigenvalue problemIs determined, wherein z is the feature vector.
Further, the sixth step of normalizing the adjacency matrix and calculating the left and right eigenvectors thereof comprises the following specific processes:
(1) in the bipartite drawing
Figure BDA0002187147880000048
Wherein D1(i,i)=∑jAijAnd D2(j,j)=∑iAijIs a diagonal matrix;
(2) let z ═ z1z2]TProblem of generalized eigenvalues
Figure BDA0002187147880000049
Is equivalent to
Figure BDA00021871478800000410
The above formula is equivalent to:
D1z1-Az2=λD1z1
-ATz1+D2z2=λD2z2
make it
Figure BDA00021871478800000411
The above formula is equivalent to:
Figure BDA00021871478800000412
Figure BDA00021871478800000413
thus, the above equation is equivalent to a normalized matrix
Figure BDA0002187147880000051
Singular value decomposition of (c).
Further, the seventh step uses the K-means algorithm to pair vectors
Figure BDA0002187147880000052
And clustering to obtain a final clustering label.
Compared with the prior art, the invention has the remarkable characteristics that: (1) the invention adopts non-overlapping spatial neighborhood subsets, nodes in the neighborhood subsets are usually positioned in a low-dimensional subspace and are usually formed by dictionary atoms of the same class, and the identifiability between high-dimensional data classes is obtained; (2) the dictionary learning method under the combined sparse representation constraint optimization framework is used for capturing inherent local sparsity and non-local self-similarity of high-dimensional data, and the calculation complexity and parameter setting process are reduced; (3) and (3) capturing row and column information of the adjacency matrix through bipartite graph segmentation of the high-dimensional data and capturing correlation between the high-dimensional data nodes and dictionary atoms.
Drawings
FIG. 1 is a flow chart of the high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation.
FIG. 2 is a schematic diagram of data subsets that segment high-dimensional data sets into non-overlapping.
FIG. 3 is a bipartite graph between high-dimensional data nodes and dictionary atoms.
FIG. 4(a) is a plot of the actual terrain profile for the Indian Pines dataset.
FIG. 4(b) is a graph of the clustering effect of Indian Pines data sets using the K-means method.
FIG. 4(c) is a diagram of the clustering effect of Indian Pines data set using the CFSFDP method.
FIG. 4(d) is a clustering effect diagram of a high-dimensional data clustering method of spectral dictionary learning and spectral clustering for Indian Pines data sets.
FIG. 4(e) is a clustering effect diagram of a high-dimensional data clustering method of Indian Pines data set using spatial dictionary learning and spectral clustering.
FIG. 4(f) is a graph of the clustering effect of the high-dimensional data clustering method in which the Indian Pines data set employs joint sparse representation and spectral clustering.
FIG. 4(g) is a clustering effect diagram of a high-dimensional data joint clustering method of Indian Pines data sets by spectral dictionary learning and bipartite graph segmentation.
FIG. 4(h) is a clustering effect diagram of a high-dimensional data joint clustering method of Indian Pines data set by using spatial dictionary learning and bipartite graph segmentation.
FIG. 4(i) is a graph of the clustering effect of Indian Pines datasets using the method of the present invention.
FIG. 5(a) is a plot of the true terrain profile for the Pavia University dataset.
FIG. 5(b) is a graph of the clustering effect of the Pavia University dataset by the K-means method.
FIG. 5(c) is a graph of the clustering effect of the Pavia University data set using the CFSFDP method.
FIG. 5(d) is a clustering effect diagram of the high-dimensional data clustering method of the Pavia University data set using spectral dictionary learning and spectral clustering.
FIG. 5(e) is a clustering effect diagram of a high-dimensional data clustering method of Pavia University data set employing spatial dictionary learning and spectral clustering.
FIG. 5(f) is a clustering effect diagram of a high-dimensional data clustering method in which the Pavia University data set adopts joint sparse representation and spectral clustering.
FIG. 5(g) is a clustering effect diagram of a high-dimensional data joint clustering method of the Pavia University data set by using spectral dictionary learning and bipartite graph segmentation.
FIG. 5(h) is a clustering effect diagram of a high-dimensional data joint clustering method of Pavia University data set by using spatial dictionary learning and bipartite graph segmentation.
FIG. 5(i) is a graph of the clustering effect of the Pavia University dataset using the method of the present invention.
Detailed Description
The invention provides a high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation, which adopts non-overlapped local neighborhood subsets and structure dictionary learning under joint sparse representation constraint to simultaneously mine local sparsity and autocorrelation properties in high-dimensional data, and reduces the calculation complexity and parameter setting process; and the row and column information of the adjacency matrix and the correlation between the high-dimensional data nodes and the dictionary atoms are captured by dividing the high-dimensional data nodes through the bipartite graph. The specific steps of the present invention are described in detail with reference to fig. 1:
first, a high-dimensional data set is divided into non-overlapping data subsets by using spatial proximity, and a hyperspectral image Y is input by taking the image shown in fig. 4(a) as an example1,…,yi,…,yn]∈Rd×nAnd d is 200, and n is 21025, and the hyperspectral image element is divided into a plurality of non-overlapping 5 × 5 square space neighborhood subsets. The specific division is shown in fig. 2.
And secondly, constructing a structural dictionary, wherein the specific process is as follows:
(1) for the ith pixel y (i is more than or equal to 1 and less than or equal to 21025)iIntercepting included pixel y in hyperspectral imageiIs denoted as Γ, and is a 5 × 5 neighborhood subset ofiExpressed as follows:
Figure BDA0002187147880000061
wherein y isi,25Representing a spatial neighborhood subset ΓiOfiAnd | high spectral pixel.
(2) The structural dictionary learning model is represented as follows:
Figure BDA0002187147880000071
wherein X ═ X1,…,xi,…,xn]∈Rm×nIs a joint sparse representation coefficient, Y is a hyperspectral image, D ═ D1,…,di,…,dm]∈Rd×mIs a dictionary of the structure of the text,
Figure BDA0002187147880000072
is a2/l1Norm, which represents
Figure BDA0002187147880000073
Line of2The sum of the norms is then calculated,
Figure BDA0002187147880000074
is a regularization parameter where the joint sparse representation coefficient X provides a correlation between the hyperspectral image pixels and the dictionary atoms.
Thirdly, defining a bipartite graph of the hyperspectral image cluster, and the specific process is as follows:
defining an undirected bipartite graph
Figure BDA0002187147880000075
It consists of two disjoint sets of vertices, whereIs a collection of dictionary atoms that is,
Figure BDA0002187147880000077
is a hyperspectral image pixel set, connected by corresponding edge sets E, the set E represents all edge weights EijIn which EijIs the weight of the edge between the ith vertex and the jth vertex in the bipartite graph, and the edge EijExist only between two heterogeneous sets of vertices (vertices within the set do not communicate), as shown in fig. 3.
Fourth, a adjacency matrix of the bipartite graph is constructed, i.e. by mapping the joint sparse coefficients to a non-negative adjacency matrix of the graph, in particular
Figure BDA0002187147880000078
Wherein a ═ X |.
And fifthly, constructing a bipartite graph segmentation and optimization model, wherein the concrete process is as follows:
(1) for simplicity, consider a division into two clusters, assuming
Figure BDA0002187147880000079
The method is a partition of a bipartite graph, in order to better divide a sample into two clusters and balance the size of each cluster, a normalized partition for partitioning the bipartite graph is adopted, and the normalized partition can be written as:
Figure BDA00021871478800000710
wherein
Figure BDA00021871478800000711
And
Figure BDA00021871478800000712
represents the accumulated edge weights between the clusters,
Figure BDA00021871478800000713
representing the accumulated edge weights within a cluster.
(2) Let q be the vector of the segmented bipartite graph G if
Figure BDA00021871478800000714
q 11, otherwise q2Is-1. The rayleigh quotient of the vector q is equivalent to the segmentation optimization model in the previous step, and specifically includes:
Figure BDA0002187147880000081
wherein
Figure BDA0002187147880000082
Is a matrix of laplacian data to be encoded,
Figure BDA0002187147880000083
is a diagonal matrix.
The above formula is equivalent to:
Figure BDA0002187147880000084
wherein
Figure BDA0002187147880000085
All elements of vector e are equal to 1.
(3) The discrete segmentation vector q in the segmentation optimization model in the previous step can be relaxed in a continuous vector form, specifically:
Figure BDA0002187147880000086
the above solving problem corresponds to a generalized eigenvalue problem
Figure BDA0002187147880000087
Is determined, wherein z is the feature vector.
Sixthly, standardizing the adjacency matrix and calculating the left and right eigenvectors thereof, specifically:
(1) in the bipartite drawing
Figure BDA0002187147880000088
Wherein D1(i,i)=∑jAijAnd D2(j,j)=∑iAijIs a diagonal matrix.
(2) Let z ═ z1z2]TProblem of generalized eigenvalues
Figure BDA0002187147880000089
Is equivalent to
Figure BDA00021871478800000810
The above formula is equivalent to:
make it
Figure BDA00021871478800000811
The above formula is equivalent to:
Figure BDA00021871478800000812
Figure BDA00021871478800000813
thus, the above equation is equivalent to a normalized matrixSingular value decomposition of (c).
Seventhly, performing combined clustering on the left and right feature vectors by using a K-mean algorithm to obtain a final clustering label, namely using the K-mean algorithm to perform vector matching
Figure BDA0002187147880000091
And clustering to obtain a final clustering label.
The method efficiently utilizes the joint sparsity in the data, integrates the joint representation characteristics of representation dictionary atoms and coefficients, overcomes the defect that the traditional sparsity clustering only utilizes the representation coefficients, improves the clustering precision, and enhances the robustness to noise. The method can be widely applied to the unsupervised classification of high-dimensional data in the fields of homeland resources, mineral survey and precision agriculture.
The invention is further described in detail below with reference to examples of hyperspectral image clustering and the accompanying drawings.
Examples
(1) Simulation conditions
The simulation experiment adopts two groups of real hyperspectral data: indian Pines dataset and Pavia University dataset. The Indian Pines dataset is a hyperspectral remote sensing image acquired by an airborne visible infrared imaging spectrometer (AVIRIS) in an Indian Pines experimental area, indiana. The image contains 220 bands in total, the spatial resolution is 20m, and the image size is 145 × 145. After removing 20 water vapor absorption and low signal-to-noise ratio bands (band numbers 104-. The region contains 16 known land features in total, and 8 land features are selected as experiments in order to balance the balance among the land features. The Pavia University dataset was acquired by a ross sensor in parkia, and included 115 bands in total, with an image size of 610 × 340, and after removing the noise band, the remaining 103 bands were selected as the study objects. Considering the problem of computational complexity, the invention selects a sub-graph with the size of 200 × 100. The simulation experiments are all completed by adopting MATLAB R2014a under a Windows 7 operating system.
The evaluation indexes adopted by the invention are an evaluation method of clustering accuracy (ACC, clustering method of clustering accuracy), adjusting Lande Index (ARI, Adjusted Rand Index), adjusting Mutual Information (AMI, Adjusted Mutual Information), normalizing Mutual Information (NMI, Normalized Mutual Information), Homogeneity (Homogeneity), integrity (completeness), harmonic mean (V-measure) and Fowles-Mallows Index (FMI, Fokes-Mallows Index).
(2) Emulated content
The invention adopts the clustering performance of a real hyperspectral data set inspection algorithm. In order to test the performance of the algorithm, the proposed high-dimensional data joint clustering method (BGP-JSDL) for joint sparse representation and bipartite graph segmentation is compared with the current internationally popular clustering algorithm. The comparison method comprises the following steps: k-means, CFSFDP, a high-dimensional data clustering method of spectral dictionary learning and spectral clustering (SC-SDL), a high-dimensional data clustering method of spatial dictionary learning and spectral clustering (SC-CDL), a high-dimensional data clustering method of joint sparse representation and spectral clustering (SC-JSDL), a high-dimensional data joint clustering method of spectral dictionary learning and bipartite graph segmentation (BGP-SDL), a high-dimensional data joint clustering method of spatial dictionary learning and bipartite graph segmentation (BGP-CDL), and a high-dimensional data joint clustering method of joint sparse representation and bipartite graph segmentation (BGP-JSDL).
(3) Analysis of simulation experiment results
Tables 1 and 2 show the clustering precision and the comparison result of different evaluation indexes of two groups of hyperspectral data sets under different clustering algorithms.
TABLE 1 quantitative evaluation of different clustering algorithms for Indian Pines datasets (ACC, ARI, AMI, NMI, homogeneity, completeness, V _ means, FMI (%))
TABLE 2 quantitative evaluation of different clustering algorithms for the Pavia University dataset (ACC, ARI, AMI, NMI, homogeneity, completeness, V _ means, FMI (%))
Figure BDA0002187147880000102
As can be seen from table 1, in the Indian Pines dataset, the JSDL significantly improves the clustering accuracy in different evaluation indexes by virtue of the inherent local sparsity of the captured hyperspectral image and the discriminativity of the dictionary, compared with the SDL and the CDL. The high-dimensional data clustering based on bipartite graph segmentation is characterized in that the correlation between the pixels of the hyperspectral images and the atoms of the dictionary is captured, and compared with a high-dimensional data clustering method based on spectral clustering, the clustering precision is remarkably improved. As can be seen from Table 2, the same conclusions can be drawn on the Paviauniversity dataset. The result effect graphs of the method of the invention on two sets of data sets are shown in fig. 4 and fig. 5. The simulation experiment results of the two groups of real data sets show the effectiveness of the method.

Claims (8)

1. A high-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation is characterized by comprising the following steps:
the method comprises the following steps that firstly, a high-dimensional data set is divided into non-overlapping data subsets by utilizing spatial proximity;
secondly, constructing a structure dictionary and calculating the correlation between the structure dictionary and each data node, namely learning a joint sparse representation coefficient and the structure dictionary in combination with the context characteristics of high-dimensional data in an optimization model of joint sparse representation constraint;
step three, defining a bipartite graph of high-dimensional data joint clustering, namely defining an undirected bipartite graph comprising two disjoint vertex sets;
fourthly, constructing an adjacent matrix of the bipartite graph, namely mapping the joint sparse representation coefficient to a non-negative adjacent matrix of the graph;
fifthly, constructing a bipartite graph segmentation and optimization model, namely constructing a bipartite graph optimization model by using an adjacency matrix and using a normalized segmentation optimization model;
sixthly, standardizing the adjacent matrix and calculating left and right eigenvectors of the adjacent matrix, namely calculating the standardized matrix of the adjacent matrix, and decomposing the normalized matrix by using singular value decomposition to obtain the left and right eigenvectors;
and seventhly, performing combined clustering on the left and right feature vectors by using a K-means algorithm to obtain a final clustering label.
2. The method of claim 1, wherein the first step of using spatial proximity to partition the high-dimensional data set into non-overlapping subsets of data is to divide the high-dimensional data set into non-overlapping subsets of w x w square spatial neighbors, where w is greater than or equal to 3.
3. The method for jointly clustering high-dimensional data by jointly representing sparse data and segmenting bipartite graph according to claim 1, wherein the second step constructs a structural dictionary and calculates the correlation between the structural dictionary and each data node, and the specific process is as follows:
(1) for ith data node yiI is more than or equal to 1 and less than or equal to n, n is the number of samples, and the node y is intercepted from the high-dimensional dataiIs denoted as ΓiExpressed as follows:
Figure FDA0002187147870000011
wherein
Figure FDA0002187147870000012
Representing a spatial neighborhood subset ΓiOfiL number of data nodes;
(2) the structural dictionary learning model is represented as follows:
Figure FDA0002187147870000013
wherein X ═ X1,…,xi,…,xn]∈Rm×nIs a joint sparse representation coefficient, D ═ D1,…,di,…,dm]∈Rd×mIs a dictionary of the structure of the text,is a2/l1Norm, which represents
Figure FDA0002187147870000022
Line of2The sum of the norms is then calculated,is a regularization parameter in which the joint sparse representation coefficient X provides a correlation between the high dimensional data node and the dictionary atoms.
4. The method for jointly clustering high-dimensional data by jointly sparse representation and bipartite graph segmentation according to claim 1, wherein the third step defines a bipartite graph for jointly clustering high-dimensional data by:
defining an undirected bipartite graph
Figure FDA0002187147870000024
It consists of two disjoint sets of vertices, where
Figure FDA0002187147870000025
Is a collection of dictionary atoms that is,
Figure FDA0002187147870000026
is a high-dimensional data node set connected by corresponding edge sets E, and the set E represents all edge weights EijIn which EijIs the weight of the edge between the ith vertex and the jth vertex in the bipartite graph, and the edge EijOnly between two heterogeneous vertex sets.
5. Method for jointly clustering high-dimensional data by joint sparse representation and bipartite graph partitioning according to claim 1, wherein the fourth step constructs a adjacency matrix of the bipartite graph by mapping joint sparse coefficients to non-negative adjacency matrices of the graph, in particular to non-negative adjacency matrices of the graphWherein a ═ X |.
6. The method for jointly clustering sparse representation and bipartite graph segmented high-dimensional data according to claim 1, wherein the fifth step is constructing a bipartite graph segmentation and optimization model by the following specific processes:
(1) divided into two clusters, assuming
Figure FDA0002187147870000028
Is a division of the bipartite graph, and adopts a normalized cut for dividing the bipartite graph, wherein the normalized cut can be written as:
Figure FDA0002187147870000029
wherein
Figure FDA00021871478700000210
And
Figure FDA00021871478700000212
represents the accumulated edge weights between the clusters,
Figure FDA00021871478700000213
representing the accumulated edge weights within a cluster;
(2) let q be the vector of the segmented bipartite graph G if
Figure FDA00021871478700000214
q11, otherwise q2-1; the rayleigh quotient of the vector q is equivalent to the segmentation optimization model in step (1), and specifically comprises the following steps:
Figure FDA00021871478700000215
wherein
Figure FDA0002187147870000031
Is a matrix of laplacian data to be encoded,
Figure FDA0002187147870000032
is a diagonal matrix;
the above formula is equivalent to:
Figure FDA0002187147870000033
wherein
Figure FDA0002187147870000034
All elements of vector e are equal to 1;
(3) the discrete segmentation vector q in the segmentation optimization model is relaxed in a continuous vector form, and the method specifically comprises the following steps:
Figure FDA0002187147870000035
the above solving problem corresponds to a generalized eigenvalue problem
Figure FDA0002187147870000036
Is determined, wherein z is the feature vector.
7. The method for jointly clustering sparse representations and bipartite graph segmented high-dimensional data according to claim 1, wherein the sixth step normalizes the adjacency matrix and calculates its left and right eigenvectors by:
(1) in the bipartite drawing
Figure FDA0002187147870000037
Wherein D1(i,i)=∑jAijAnd D2(j,j)=∑iAijIs a diagonal matrix;
(2) let z ═ z1z2]TProblem of generalized eigenvalues
Figure FDA0002187147870000038
Is equivalent to
Figure FDA0002187147870000039
The above formula is equivalent to:
D1z1-Az2=λD1z1
-ATz1+D2z2=λD2z2
make it
Figure FDA00021871478700000310
The above formula is equivalent to:
Figure FDA00021871478700000311
Figure FDA00021871478700000312
thus, the above equation is equivalent to a normalized matrix
Figure FDA00021871478700000313
Singular value decomposition of (c).
8. According to the claims7, the high-dimensional data joint clustering method for joint sparse representation and bipartite graph segmentation is characterized in that in the seventh step, a K-means algorithm is used for vector
Figure FDA0002187147870000041
And clustering to obtain a final clustering label.
CN201910819539.2A 2019-08-31 2019-08-31 High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation Withdrawn CN110674848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910819539.2A CN110674848A (en) 2019-08-31 2019-08-31 High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910819539.2A CN110674848A (en) 2019-08-31 2019-08-31 High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation

Publications (1)

Publication Number Publication Date
CN110674848A true CN110674848A (en) 2020-01-10

Family

ID=69076109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910819539.2A Withdrawn CN110674848A (en) 2019-08-31 2019-08-31 High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation

Country Status (1)

Country Link
CN (1) CN110674848A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112768029A (en) * 2020-12-27 2021-05-07 上海市东方医院(同济大学附属东方医院) Combined medication recommendation device, method and medium based on single cell sequencing
WO2021164382A1 (en) * 2020-02-17 2021-08-26 支付宝(杭州)信息技术有限公司 Method and apparatus for performing feature processing for user classification model
CN115374191A (en) * 2022-10-26 2022-11-22 国网湖北省电力有限公司信息通信公司 Multi-source data-driven cluster method for heterogeneous equipment of data center

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021164382A1 (en) * 2020-02-17 2021-08-26 支付宝(杭州)信息技术有限公司 Method and apparatus for performing feature processing for user classification model
CN112768029A (en) * 2020-12-27 2021-05-07 上海市东方医院(同济大学附属东方医院) Combined medication recommendation device, method and medium based on single cell sequencing
CN112768029B (en) * 2020-12-27 2023-10-13 上海市东方医院(同济大学附属东方医院) Combined drug recommendation equipment, method and medium based on single cell sequencing
CN115374191A (en) * 2022-10-26 2022-11-22 国网湖北省电力有限公司信息通信公司 Multi-source data-driven cluster method for heterogeneous equipment of data center
CN115374191B (en) * 2022-10-26 2023-01-31 国网湖北省电力有限公司信息通信公司 Multi-source data-driven cluster method for heterogeneous equipment of data center

Similar Documents

Publication Publication Date Title
CN110321963B (en) Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features
CN111860612B (en) Unsupervised hyperspectral image hidden low-rank projection learning feature extraction method
Fu et al. Hyperspectral anomaly detection via deep plug-and-play denoising CNN regularization
CN110084159B (en) Hyperspectral image classification method based on combined multistage spatial spectrum information CNN
CN110399909B (en) Hyperspectral image classification method based on label constraint elastic network graph model
CN108460342B (en) Hyperspectral image classification method based on convolutional neural network and cyclic neural network
Du et al. A spectral-spatial based local summation anomaly detection method for hyperspectral images
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
CN107451614B (en) Hyperspectral classification method based on fusion of space coordinates and space spectrum features
CN107992891B (en) Multispectral remote sensing image change detection method based on spectral vector analysis
CN108197650B (en) Hyperspectral image extreme learning machine clustering method with local similarity maintained
CN109376753B (en) Probability calculation method for three-dimensional spatial spectrum space dimension pixel generic
Liu et al. Enhancing spectral unmixing by local neighborhood weights
Ortac et al. Comparative study of hyperspectral image classification by multidimensional Convolutional Neural Network approaches to improve accuracy
CN103208011B (en) Based on average drifting and the hyperspectral image space-spectral domain classification method organizing sparse coding
CN105989336B (en) Scene recognition method based on deconvolution deep network learning with weight
CN112308152B (en) Hyperspectral image ground object classification method based on spectrum segmentation and homogeneous region detection
CN110674848A (en) High-dimensional data joint clustering method combining sparse representation and bipartite graph segmentation
CN107292258B (en) High-spectral image low-rank representation clustering method based on bilateral weighted modulation and filtering
Ma et al. Hyperspectral anomaly detection based on low-rank representation with data-driven projection and dictionary construction
Paul et al. Dimensionality reduction using band correlation and variance measure from discrete wavelet transformed hyperspectral imagery
CN112381144B (en) Heterogeneous deep network method for non-European and Euclidean domain space spectrum feature learning
CN112052758B (en) Hyperspectral image classification method based on attention mechanism and cyclic neural network
CN114937173A (en) Hyperspectral image rapid classification method based on dynamic graph convolution network
Ren et al. PolSAR feature extraction via tensor embedding framework for land cover classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200110