CN109002854A - Based on hidden expression and adaptive multiple view Subspace clustering method - Google Patents
Based on hidden expression and adaptive multiple view Subspace clustering method Download PDFInfo
- Publication number
- CN109002854A CN109002854A CN201810801776.1A CN201810801776A CN109002854A CN 109002854 A CN109002854 A CN 109002854A CN 201810801776 A CN201810801776 A CN 201810801776A CN 109002854 A CN109002854 A CN 109002854A
- Authority
- CN
- China
- Prior art keywords
- matrix
- view
- clustering
- objective function
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000014509 gene expression Effects 0.000 title claims abstract description 16
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims abstract description 186
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 2
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 1
- 230000008602 contraction Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 abstract description 4
- 238000003709 image segmentation Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 25
- 238000004088 simulation Methods 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 10
- 230000003595 spectral effect Effects 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 238000000354 decomposition reaction Methods 0.000 description 4
- 238000003909 pattern recognition Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention proposes a kind of based on hidden expression and adaptive multiple view Subspace clustering method, it mainly solves the problems, such as that cluster accuracy rate present in multiple view clustering method is low, realizes step are as follows: (1) obtain the multiple view data matrix of raw data set;(2) Laplacian Matrix of multiple view data matrix is calculated;(3) objective function based on hidden expression and adaptive multiple view subspace clustering is constructed;(4) objective function is optimized;(5) variable in the objective function after optimization is initialized;(6) alternating iteration is carried out to the variable in the objective function after optimization;(7) value of the multiple view in the objective function after calculation optimization from expression coefficient matrix;(8) raw data set is clustered.The present invention makes full use of the information of multiple views, effectively increases the accuracy rate of multiple view cluster, can be used for image segmentation, business analysis, the fields such as biological classification using hidden expression and adaptively.
Description
Technical Field
The invention belongs to the technical field of computer vision and pattern recognition, relates to a multi-view subspace clustering method, and particularly relates to a multi-view subspace clustering method based on implicit representation and self-adaptation, which can be used for image segmentation, commercial analysis, biological classification and the like.
Background
In recent years, due to the rapid development of computer information technology, while emerging technologies change the human society, there is a concomitant explosive growth of data, and therefore, the acquisition and analysis of data becomes more and more important. Data mining is a process of finding hidden information in a large amount of data and extracting knowledge. Clustering is an important data mining method, and is a process of dividing a set of physical or abstract objects into a plurality of clusters consisting of similar objects, so that the objects in the same cluster have higher similarity, and the objects in different clusters have lower similarity.
A conventional data set is represented by a single feature, called a single view data set. However, the information in the original data set contained in the single-view data set is not complete, and in order to solve this problem, the prior art enables the data set to be represented by a plurality of features, which is called a multi-view data set. For example, the same picture can be described by using features such as SIFT and HOG; for the same news report, different languages can be used for expression; for web page data, the text and link information may be presented as two different views. If the traditional clustering method is adopted to cluster the data set, the clustering effect is not ideal due to the fact that the information of a plurality of views cannot be fully utilized. Thus, multi-view clustering is proposed. Multi-view clustering tends to yield more accurate clustering results by exploiting the consistency and diversity of multiple views.
The multi-view clustering algorithm can be divided into a multi-view clustering algorithm based on K mean value and a multi-view clustering algorithm based on spectral clustering, and the multi-view clustering algorithm based on K mean value has unstable clustering results because the selection of initial points is random and the selective correlation between the clustering results and the initial points is relatively high.
The multi-view clustering algorithm based on spectral clustering can keep the local geometric structure among the samples of the data set, and can often obtain a relatively stable clustering result, so that a plurality of multi-view clustering algorithms based on spectral clustering appear.
Due to the fact that the samples in a data set are distributed over a specific low-dimensional Subspace, in recent years, many Subspace Clustering (SC) algorithms have emerged, which are part of the spectral Clustering algorithms. The algorithm decomposes a data matrix into a product of the data matrix and a view self-expression coefficient matrix by utilizing the property that any sample can be linearly combined by using the sample in the subspace of the sample. And then, obtaining a clustering result by utilizing the view self-expression coefficient matrix. Because the view self-expression coefficient matrix has the advantages of better interpretability, clear physical significance and the like, subspace clustering becomes a basic tool for data clustering and has wide application in single-view clustering and multi-view clustering.
For example, Hongchang Gao, feiling Nie, Xuelong Li and Heng Huang published an article named "Multi-View Subspace Clustering" in the 2015 ieee conference on Computer Vision and Pattern Recognition conference, and discloses a Multi-View Subspace Clustering method, which performs matrix decomposition on each View matrix of a Multi-View dataset to obtain a View self-representation coefficient matrix corresponding to each View matrix, constructs a similarity matrix of an original dataset using each View self-representation coefficient matrix, obtains a Clustering result of the original dataset using a spectral Clustering method, reduces the influence of noise data in the original dataset on data Clustering accuracy, and uses the difference between each View data matrix and the product of the View data matrix and the self-representation coefficient matrix corresponding to the View as a noise data matrix, and then constrains the noise data matrix, however, this method can only remove noise data of a specific type and has no robustness to noise data of a general type, so that in the 2017 ieee conference on computer Vision and Pattern Recognition conference by ChangqingZhang, Qinghua Hu, Huazhu Fu and Pengfei Zhu, etc., an article named "late Multi-view sub space Clustering" is published, a hidden Multi-view subspace Clustering method is disclosed, in order to reduce the influence of noise data contained in an original data set on the accuracy of the Clustering result of the original data set, this method assumes that all views of the original data set are derived from the same representation, called Multi-view hidden representation, each view matrix is obtained by multiplying the Multi-view hidden representation and a base matrix corresponding to each view, the Multi-view hidden representation is a consistent representation of an original data set obtained after the Multi-view matrix removes noise, this method obtains multiple hidden views at the same time, the multi-view hidden representation can be self-represented, so that a self-representation coefficient matrix of the multi-view data set is obtained, and a consistent clustering result is obtained by utilizing spectral clustering. But the method considers the information amount contained in each view to be the same, but in fact, the information amount of each view of the original data set is different, the method ignores the fact, and meanwhile, the method does not consider that the multi-view hidden representation should keep the local geometric structure in each view, and further influences the accuracy of the clustering result of the original data set.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a multi-view subspace clustering method based on implicit representation and self-adaptation, which is used for improving the accuracy of multi-view data set clustering.
The technical idea of the invention is as follows: the method comprises the steps of learning multi-view hidden representation of a multi-view data set in a self-adaptive mode, enabling the multi-view hidden representation to keep a local geometric structure in each view matrix by using graph regularization while obtaining the multi-view hidden representation, carrying out matrix decomposition on the multi-view hidden representation to obtain a multi-view self-representation coefficient matrix, and obtaining a clustering result by using spectral clustering. The method comprises the following implementation steps:
(1) obtaining a multi-view data matrix of an original data set
Respectively extracting different types of feature data from a plurality of images contained in an original data set, forming a view matrix by the same feature data, and forming a multi-view data matrix of the original data set by a plurality of view matricesWherein, X(v)The view matrix of the v-th is shown, v is 1,2, …, m represents the number of the view matrix, and m is more than or equal to 2;
(2) computing a multi-view data matrixLaplacian matrix of
(3) Constructing an objective function J of multi-view subspace clustering based on implicit representation and self-adaptation:
(3a) will be provided withDecomposed into a multi-view implicit representation matrix H and a multi-view basis matrixSet up X(v)Base matrix P of(v)With the constraint of O representing P(v)P(v)TIs as follows as IAndthe difference of the product of the sum and H is used as an error reconstruction termWherein, I represents a unit matrix (.)TRepresents a transpose of a matrix;
(3b) computingMeasure of (2)Is provided withIs adaptive weight ofIs thatThe weight parameter of (a), wherein,represents the square of the norm of the matrix F, gamma represents the adjusting parameter, gamma is more than or equal to 0,
(3c) decomposing the multi-view implicit representation matrix H into the multi-view implicit representation matrix H and the multi-viewSelf-expression coefficient matrix Z, and taking the difference of H and the product of H and Z as error reconstruction item Er,ErH-HZ and calculate ErMeasure of (E)r||2,1Let | Er||2,1Has a weight of λ1Wherein | · | purple light2,1A 2,1 norm representing a matrix;
(3d) low-rank constraint term | Z | non-calculation method for constructing multi-view self-representation coefficient matrix Z*Setting | Z | ceiling*Has a weight of λ2Wherein | · | purple light*A kernel norm representing a matrix;
(3e) using Laplace matricesConstructing a multi-view data matrixSimilarity constraint term inIs provided withHas a weight of λ3Where tr (-) represents a trace of the matrix;
(3f) will be provided with||Er||2,1、||Z||*Andand (3) carrying out weighted addition to obtain an objective function J based on implicit representation and self-adaptive multi-view subspace clustering:
(4) optimizing an objective function J:
optimizing an objective function J by adopting an alternating direction multiplier method, taking A as an auxiliary matrix variable of Z, setting the constraint of A as that A is Z, and then optimizing the objective function J by adopting the alternating direction multiplier methodAsLagrangian multiplier of (1), Q1As Er(ii) Lagrangian multiplier of H-HZ, Q2Obtaining an optimized objective function J' as a Lagrangian multiplier of A ═ Z:
wherein,< - > represents the inner product of the matrix, mu represents the regularization coefficient;
(5) initializing variables in the optimized objective function J':
to be in JZ、Er、Q1、Q2And all elements contained in a are initialized to 0, all elements contained in H are initialized to random numbers between (0,1), and μ is initialized to 0.001;
(6) and performing alternate iteration on the variables in the optimized objective function J':
for the variable H, Z in JEr、A、Q1、Q2And mu, performing alternate iteration to obtain an iteration update expression H with each variableS、ZS、 ErS、AS、Q1S、Q2SAnd muS;
(7) Calculating the value of the variable Z in the optimized objective function J':
(7a) setting the maximum iteration times of the optimized objective function J';
(7b) iteratively updating the corresponding variables by using the iterative updating expressions of the variables in the J', stopping iteration until the iteration times are equal to the set maximum iteration times, and obtaining an updated multi-view self-expression coefficient matrix
(8) Clustering the original data set:
(8a) calculating a similarity matrix S of the original data set;
(8b) calculating the clustering result of the original data set:
(8b1) diagonalizing a vector t obtained by summing each row of the similarity matrix S to obtain a degree matrix D of S, and calculating a Laplace matrix L,
(8b2) decomposing the eigenvalue of the Laplace matrix L to obtain a matrix T consisting of eigenvectors corresponding to each eigenvalue in the eigenvalue set E and E;
(8b3) arranging the characteristic values in the E according to the sequence from small to large to obtain a characteristic value set E ', and taking the first K characteristic values of the E' to form a set EKAnd selecting and E from TKThe eigenvector corresponding to each eigenvalue in the T 'forms an eigenvector matrix T', and then the normalization result of each row of the T 'is used as a sample data point, wherein K represents the number of types of the sample data point in the T', K is more than or equal to 2 and less than N, and N represents the number of sample data points in the original data set;
(8b4) randomly selecting K sample data points in the T', and taking each sample data point as an initial class of clustering centers to obtain a clustering center set R consisting of the K clustering centers;
(8b5) calculating the Euclidean distance from each sample data point in T' to each clustering center in R, distributing each sample data point to the category to which the clustering center with the minimum Euclidean distance with the sample data point belongs, calculating the mean value of the sample data points belonging to the kth category as the clustering center of the kth category, obtaining the clustering centers of K categories, and realizing the updating of R, wherein K is 1,2, … and K;
(8b6) and repeating the step (8b5) until the cluster center set R is not changed any more, and obtaining the clustering result of the original data set.
Compared with the prior art, the invention has the following advantages:
when the target function is constructed, matrix decomposition is carried out on the multi-view data matrix to obtain multi-view hidden representation, a self-adaptive method is adopted, the importance degree of each view to the multi-view hidden representation matrix is measured by utilizing one parameter, the parameters of error reconstruction items of the multi-view data matrix represented by the multi-view base matrix and the multi-view hidden representation matrix are automatically learned, meanwhile, the similarity constraint items in the multi-view data matrix are constructed to enable the multi-view hidden representation to keep the local geometric structure in each view, the information of each view is fully utilized, the more accurate similarity matrix of an original data set can be obtained, and compared with the prior art, the accuracy of multi-view clustering is effectively improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIGS. 2-4 are comparison graphs of the simulation results of the clustering accuracy of the BBCSport data set, the MSRC-v1 data set and the Caltech101-7 data set according to the present invention and the existing hidden multi-view subspace clustering method, respectively;
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, a multi-view subspace clustering method based on implicit representation and adaptation includes the following steps:
step 1) obtaining a multi-view data matrix of an original data set
Because each image has multiple types of features, different types of features of each image can be extracted, different types of feature data are respectively extracted from multiple images contained in the original data set, the same feature data form a view matrix, and multiple view matrices form the original data setMulti-view data matrix ofWherein, X(v)Represents the v-th view matrix, v is 1,2, …, m represents the number of view matrices, m ≧ 2.
Step 2) calculating a multi-view data matrixLaplacian matrix ofThe method comprises the following implementation steps:
(2a) will be the v view matrix X(v)Each column of (1) is taken as a sample point, and any two sample points are calculatedAndeuropean distance betweeni, j ═ 1,2, …, N denotes the number of sample points in the original data set;
(2b) computing a multi-view data matrixIs associated with the matrix Wherein, W(v)A correlation matrix representing the v-th view matrix,represents W(v)Row i and column j, σ denotes highA bandwidth parameter of the kernel;
(2c) for correlation matrixSumming the rows to obtain a vector r, and diagonalizing r to obtain a diagonal multiview matrixAnd computing a multi-view data matrixLaplacian matrix of
Step 3) constructing an objective function J of the multi-view subspace clustering based on implicit representation and self-adaption:
(3a) since the assumption employed by the present invention is that the multi-view data matrixIs composed of a multi-view implicit expression matrix H and a multi-view basis matrixMultiplied by each other, thus will beDecomposed into a multi-view implicit representation matrix H and a multi-view basis matrixSetting the v view matrix X(v)Base matrix P of(v)With the constraint of O representing P(v)P(v)TFor purposes of weighingAndwill be inconsistent withAndthe difference of the product of the sum and H is used as an error reconstruction term Wherein, I represents a unit matrix (.)TRepresents a transpose of a matrix;
(3b) to measureAndthe size of the inconsistency of (2), calculatingMeasure of (2)Is provided withIs adaptive weight ofIs thatThe weight parameter of (a), wherein,represents the square of the norm of the matrix F, gamma represents the adjusting parameter, gamma is more than or equal to 0,due to the inability to acquire the view matrix X(v)The amount of information contained, and hence adaptive weightingAutomatic studyThe function of gamma is to regulateThe distribution of (a);
(3c) since H is a multiview hidden representation matrix, H isTo obtain a comprehensive representation ofDecomposing the multi-view hidden representation matrix H into a multi-view hidden representation matrix H and a multi-view self-representation coefficient matrix Z, and taking the difference of the products of H and Z as an error reconstruction item E in order to measure the inconsistency of H and HZr,ErTo measure the magnitude of H and HZ inconsistencies, E was calculatedrMeasure of (E)r||2,1Let | Er||2,1Has a weight of λ1Wherein | · | purple light2,1Representing the 2,1 norm of the matrix, calculating ErThe 2,1 norm of (1) is to enable the multi-view hidden representation to remove general noise, and enhance the robustness of the objective function of the invention to the general noise;
(3d) in order to obtain a unique Z and make Z have a low-rank structure of the similarity matrix of the original data set, a low-rank constraint term Z of Z is constructed*Setting | Z | ceiling*Has a weight of λ2Wherein | · | purple light*A kernel norm representing a matrix;
(3e) is composed ofMake H remainThe local geometry of the sample in (1), constructing a multi-view data matrixSimilarity constraint term inIs provided withHas a weight of λ3Where tr (-) represents a trace of the matrix;
(3f) will be provided with||Er||2,1、||Z||*Andand (3) carrying out weighted addition to obtain an objective function J based on implicit representation and self-adaptive multi-view subspace clustering:
step 4), optimizing an objective function J:
because the objective function J contains the constraint of the variables in J, the expression of the solution of each variable in J cannot be directly obtained, and the objective function J is optimized:
optimizing an objective function J by adopting an alternating direction multiplier method, taking A as an auxiliary matrix variable of Z, setting the constraint of A as that A is Z, and then optimizing the objective function J by adopting the alternating direction multiplier methodAsLagrangian multiplier of (1), Q1As Er(ii) Lagrangian multiplier of H-HZ, Q2Obtaining an optimized objective function J' as a Lagrangian multiplier of A ═ Z:
wherein,< - > represents the inner product of the matrix, and μ represents the regularization coefficient.
Step 5) initializing variables in the optimized objective function J':
to be in JZ、Er、Q1、Q2And all elements contained in A are initialized to 0, all elements contained in H are initialized to random numbers between (0,1), mu is initialized to 0.001, and the variables in J 'are initialized so as to ensure that the optimization algorithm of J' can run iteratively.
Step 6) carrying out alternate iteration on the variables in the optimized objective function J':
for the variable H, Z in JEr、A、Q1、Q2And mu, performing alternate iteration to obtain an iteration update expression H with each variableS、ZS、 ErS、AS、Q1S、Q2SAnd muSThe method comprises the following implementation steps:
iteratively updating a multi-view hidden representation matrix H using lyap (P, Q, -T'), wherein lyap (·) represents a solution to Sylvester's equation;
(II) utilization of (H)TH+I)-1(A+HTH-HTEr+(Q2+HTQ1) Mu) iteratively updating the multi-view self-representation coefficient matrix Z;
(III) utilization ofIteratively updating a multi-view basis matrixWhereinAndrepresentation matrixA set of matrices formed by the left singular vectors and a set of matrices formed by the right singular vectors;
(IV) utilization ofIteratively updating error reconstruction terms
(V) utilization ofIteratively updating the error reconstruction term ErColumn j of (a), wherein,B:,jrepresents the jth column of matrix B;
(VI) utilization ofIteratively updating an auxiliary matrix variable A, wherein U and V represent a matrixLeft and right singular vectors of (d), sigma representing a diagonal matrix of singular values, Sδ(X) ═ max (X- δ,0) + min (X + δ,0) represents the shrink operator, where max (·, ·) represents the maximum of the two numbers and min (·, ·) represents the minimum of the two numbers;
(VII) utilization ofIterative updating
(VIII) by Q1+μ(H-HZ-Er) Iteratively updating Q1,Iterative updatingQ2=Q2+ μ (A-Z) iteratively updating Q2And repeatedly updating mu by rho mu, wherein rho represents the adjusting parameter of mu, and rho is larger than or equal to 1.
Step 7) calculating the value of the variable Z in the optimized objective function J':
(7a) setting the maximum iteration times of the optimized objective function J';
(7b) carrying out iterative updating on corresponding variables by using iterative updating expressions of the variables in J', stopping iteration until the iteration times are equal to the set maximum iteration times, and obtaining an updated multi-view self-expression coefficient matrix because Z can reflect the similarity between data in the original data setBecause each variable in J 'can be calculated by other variables in J' except for the variable, the variables in J 'can be continuously updated iteratively until the set maximum iteration number of J' is reached, and the updated variables of J 'are obtained after the set maximum iteration number of J' is reached
Step 8) clustering the original data set:
(8a) due to the fact thatEmbodies the similarity relationship between samples in the original data set, and therefore can be usedCalculating a similarity matrix S of the original data set, wherein the calculation formula is as follows:
wherein, | - | represents a matrix formed by taking absolute values of each element of the matrix;
(8b) calculating the clustering result of the original data set:
(8b1) the vector t obtained by summing each row of the similarity matrix S is diagonalized to obtain a degree matrix D of S, and a laplacian matrix L of S is calculated,
(8b2) performing eigenvalue decomposition on the Laplace matrix L to obtain a set E of eigenvalues of the L and a set T consisting of eigenvectors corresponding to each eigenvalue in the E, sorting the eigenvalues in the E from small to large to obtain a sorted set E 'of the eigenvectors, and taking the first K eigenvalues E' of the EKTaking out EKForming a feature vector matrix T ' by the corresponding feature vectors in the T, normalizing each row of the T ', and taking each row of the T ' as a sample data point;
(8b3) randomly selecting K sample data points of the characteristic vector matrix T' as initial clustering centers R of K classes;
(8b4) randomly selecting K sample data points in the T', and taking each sample data point as an initial class of clustering centers to obtain a clustering center set R consisting of the K clustering centers;
(8b5) calculating the Euclidean distance from each sample data point in T' to each clustering center in R, distributing each sample data point to the category to which the clustering center with the minimum Euclidean distance with the sample data point belongs, calculating the mean value of the sample data points belonging to the kth category as the clustering center of the kth category, obtaining the clustering centers of K categories, and realizing the updating of R, wherein K is 1,2, … and K;
(8b6) and repeating the step (8b5) until the cluster center set R is not changed any more, and obtaining the clustering result of the original data set.
The technical effects of the present invention will be further explained below by combining with simulation experiments.
1. Simulation conditions and contents:
simulation conditions are as follows:
in the simulation experiment, the computer configuration environment is an Intel (R) Core (i7-7700)3.60GHZ central processing unit and an internal memory 32G, WINDOWS7 operating system, and computer simulation software adopts MATLAB R2016b software.
The simulation experiment adopted a BBCSport data set, a MSRC-v1 data set and a Caltech101-7 data set respectively.
Simulation content:
simulation 1
The accuracy of clustering is compared and simulated under a BBCSport data set by using the method and the existing hidden multi-view subspace clustering method, and the result is shown in figure 2.
Simulation 2
The accuracy of clustering is compared and simulated by utilizing the method and the existing hidden multi-view subspace clustering method under the MSRC-v1 data set, and the result is shown in figure 3.
Simulation 3
The accuracy of clustering is compared and simulated by utilizing the method and the existing hidden multi-view subspace clustering method under the Caltech101-7 data set, and the result is shown in figure 4.
2. And (3) simulation result analysis:
referring to fig. 2, under the BBCSport data set, when the number of test samples is 60, 90, 120, 150, 180, 210, respectively, the accuracy of clustering the multiview data set by using the present invention is significantly higher than the accuracy of clustering the multiview data set by using the prior art, and when the number of test samples is 180, the accuracy of clustering the multiview data set by using the present invention is the smallest, 6.0%, compared with the result of clustering the multiview data set by using the prior art. Referring to fig. 3, under the MSRC-v1 dataset, when the number of test samples is 28, 56, 84, 112, 140, 168, 196, respectively, the accuracy of clustering the multiview dataset by using the present invention is higher than the accuracy of clustering the multiview dataset by using the prior art, and when the number of test samples is 28, the accuracy of clustering the multiview dataset by using the present invention is improved by 2.0% at the minimum compared with the result of clustering the multiview dataset by using the prior art. Referring to fig. 4, under the Caltech101-7 data set, when the number of the test samples is 35, 70, 105, 140, 175, 210, respectively, the accuracy of clustering the multiview data set by using the present invention is significantly higher than the accuracy of clustering the multiview data set by using the prior art, and when the number of the test samples is 70, the accuracy of the clustering result of the multiview data set by using the present invention, which is improved relative to the clustering result of the multiview data set by using the prior art, is the smallest, and 2.3%.
From the simulation results of fig. 2-4, when different numbers of test data are adopted under different data sets, the accuracy of clustering the multi-view data set by using the method is obviously higher than that of clustering the multi-view data set by using the prior art, because when the multi-view clustering is performed, compared with the prior art, the method obtains a common hidden representation of the multi-view data in a self-adaptive mode, and uses the regularization to ensure that the multi-view hidden representation keeps the local geometric structure of the samples in the original data set, so that the similarity matrix of the original data set has a more accurate structure, and compared with the prior art, the accuracy of clustering the multi-view data set is effectively improved.
Claims (4)
1. A multi-view subspace clustering method based on implicit representation and self-adaptation is characterized by comprising the following implementation steps:
(1) obtaining a multi-view data matrix of an original data set
Extracting different types of feature data from a plurality of images contained in an original data set respectively, wherein the same feature data form a view matrix, and a plurality of view matrices form multiple views of the original data setData matrixWherein, X(v)The view matrix of the v-th is shown, v is 1,2, …, m represents the number of the view matrix, and m is more than or equal to 2;
(2) computing a multi-view data matrixLaplacian matrix of
(3) Constructing an objective function J of multi-view subspace clustering based on implicit representation and self-adaptation:
(3a) will be provided withDecomposed into a multi-view implicit representation matrix H and a multi-view basis matrixSet up X(v)Base matrix P of(v)With the constraint of O representing P(v)P(v)TIs as follows as IAndthe difference of the product of the sum and H is used as an error reconstruction term Wherein, I represents a unit matrix (.)TRepresents a transpose of a matrix;
(3b) computingMeasure of (2)Is provided withIs adaptive weight ofIs thatThe weight parameter of (a), wherein,represents the square of the norm of the matrix F, gamma represents the adjusting parameter, gamma is more than or equal to 0,
(3c) decomposing the multi-view implicit expression matrix H into a multi-view implicit expression matrix H and a multi-view self-expression coefficient matrix Z, and taking the difference of the product of H and Z as an error reconstruction item Er,ErH-HZ and calculate ErMeasure of (E)r||2,1Let | Er||2,1Has a weight of λ1Wherein | · | purple light2,1A 2,1 norm representing a matrix;
(3d) low-rank constraint term | Z | non-calculation method for constructing multi-view self-representation coefficient matrix Z*Setting | Z | ceiling*Has a weight of λ2Wherein | · | purple light*A kernel norm representing a matrix;
(3e) using Laplace matricesConstructing a multi-view data matrixSimilarity constraint term inIs provided withHas a weight of λ3Where tr (-) represents a trace of the matrix;
(3f) will be provided with||Er||2,1- | Z | | andand (3) carrying out weighted addition to obtain an objective function J based on implicit representation and self-adaptive multi-view subspace clustering:
(4) optimizing an objective function J:
optimizing an objective function J by adopting an alternating direction multiplier method, taking A as an auxiliary matrix variable of Z, setting the constraint of A as that A is Z, and then optimizing the objective function J by adopting the alternating direction multiplier methodAsLagrangian multiplier of (1), Q1As Er(ii) Lagrangian multiplier of H-HZ, Q2Obtaining an optimized objective function J' as a Lagrangian multiplier of A ═ Z:
wherein,< - > represents the inner product of the matrix, mu represents the regularization coefficient;
(5) initializing variables in the optimized objective function J':
to be in JZ、Er、Q1、Q2And all elements contained in a are initialized to 0, all elements contained in H are initialized to random numbers between (0,1), and μ is initialized to 0.001;
(6) and performing alternate iteration on the variables in the optimized objective function J':
for the variable H, Z in JEr、A、Q1、Q2And mu, performing alternate iteration to obtain an iteration update expression H with each variableS、ZS、 ErS、AS、Q1S、Q2SAnd muS;
(7) Calculating the value of the variable Z in the optimized objective function J':
(7a) setting the maximum iteration times of the optimized objective function J';
(7b) iteratively updating the corresponding variables by using the iterative updating expressions of the variables in the J', stopping iteration until the iteration times are equal to the set maximum iteration times, and obtaining an updated multi-view self-expression coefficient matrix
(8) Clustering the original data set:
(8a) calculating a similarity matrix S of the original data set;
(8b) calculating the clustering result of the original data set:
(8b1) diagonalizing a vector t obtained by summing each row of the similarity matrix S to obtain a degree matrix D of S, and calculating a Laplace matrix L,
(8b2) decomposing the eigenvalue of the Laplace matrix L to obtain a matrix T consisting of eigenvectors corresponding to each eigenvalue in the eigenvalue set E and E;
(8b3) arranging the characteristic values in the E according to the sequence from small to large to obtain a characteristic value set E ', and taking the first K characteristic values of the E' to form a set EKAnd selecting and E from TKThe eigenvector corresponding to each eigenvalue in the T 'forms an eigenvector matrix T', and then the normalization result of each row of the T 'is used as a sample data point, wherein K represents the number of types of the sample data point in the T', K is more than or equal to 2 and less than N, and N represents the number of sample data points in the original data set;
(8b4) randomly selecting K sample data points in the T', and taking each sample data point as an initial class of clustering centers to obtain a clustering center set R consisting of the K clustering centers;
(8b5) calculating the Euclidean distance from each sample data point in T' to each clustering center in R, distributing each sample data point to the category to which the clustering center with the minimum Euclidean distance with the sample data point belongs, calculating the mean value of the sample data points belonging to the kth category as the clustering center of the kth category, obtaining the clustering centers of K categories, and realizing the updating of R, wherein K is 1,2, … and K;
(8b6) and repeating the step (8b5) until the cluster center set R is not changed any more, and obtaining the clustering result of the original data set.
2. The hidden-representation-based and adaptive multi-view subspace clustering method according to claim 1, wherein said step (2) of computing a multi-view data matrixLaplacian matrix ofThe method comprises the following implementation steps:
(2a) will be the v view matrix X(v)Each column of (1) is taken as a sample point, and any two sample points are calculatedAndeuropean distance betweeni, j is 1,2, …, N represents the number of sample data points in the original data set, v is 1,2, …, m, m represents the number of view matrixes, and m is more than or equal to 2;
(2b) computing a multi-view data matrixIs associated with the matrix Wherein, W(v)A correlation matrix representing the v-th view matrix,represents W(v)Row i and column j, σ represents the bandwidth parameter of the gaussian kernel;
(2c) for correlation matrixSumming the rows to obtain a vector r, and diagonalizing r to obtain a diagonal multiview matrixAnd computing a multi-view data matrixLaplacian matrix of
3. The hidden representation and adaptive-based multi-view subspace clustering method according to claim 1, wherein the step (6) of performing the alternate iteration on the variables in the optimized objective function J' comprises the following steps:
iteratively updating a multi-view hidden representation matrix H using lyap (P, Q, -T'), wherein lyap (. cndot.) represents the solution of Sylvester's equation,a matrix of multi-view data is represented,a multi-view basis matrix is represented,representing error reconstruction terms, ErRepresenting an error reconstruction term, Z represents a multi-view self-representation coefficient matrix,denotes the Laplace matrix, λ3Representing similarity constraint termsWeight of (2), Q1、Q2Represents the Lagrangian multiplier, I represents the identity matrix, μ represents the regularization coefficient, (. cndot.)TRepresents a transpose of a matrix;
(II) utilization ofIteratively updating a multi-view self-representation coefficient matrix Z, wherein A represents an auxiliary matrix variable of Z;
(III) utilization ofIteratively updating a multi-view basis matrixWhereinAndrepresentation matrixOf left singular vectors and a set of matrices of right singular vectors, wherein,represents a Lagrangian multiplier;
(IV) utilization ofIteratively updating error reconstruction termsWherein,represents an adaptive weight parameter, and gamma represents an adjustment parameter;
(V) utilization ofIteratively updating the error reconstruction term ErColumn j of (a), wherein,B:,jdenotes the jth column, λ, of the matrix B1Representing the error reconstruction term ErMeasure of (E)r||2,1The weight of (| · |) non-calculation2,1A 2,1 norm representing a matrix;
(VI) utilization ofIterative update assistA co-matrix variable A, where U and V represent a matrixLeft and right singular vectors of (d), sigma representing a diagonal matrix of singular values, Sδ(X) ═ max (X- δ,0) + min (X + δ,0) denotes the contraction operator, where max (·, ·) denotes the maximum of the two numbers, min (·,) denotes the minimum of the two numbers, λ2Low-rank constraint term | Z | | non-woven phosphor represented by Z*The weight of (| · |) non-calculation*A kernel norm representing a matrix;
(VII) utilization ofIterative updating
(VIII) by Q1+μ(H-HZ-Er) Iteratively updating Q1,Iterative updatingQ2=Q2+ μ (A-Z) iteratively updating Q2And repeatedly updating mu by rho mu, wherein rho represents the adjusting parameter of mu, and rho is larger than or equal to 1.
4. The hidden representation and adaptive-based multi-view subspace clustering method according to claim 1, wherein the similarity matrix S of the original data set is calculated in step (8a) according to the following formula:
wherein | represents a matrix composed of absolute values of each element of the matrix, (·)TWhich represents the transpose of the matrix,and representing the updated multi-view self-representation coefficient matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810801776.1A CN109002854A (en) | 2018-07-20 | 2018-07-20 | Based on hidden expression and adaptive multiple view Subspace clustering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810801776.1A CN109002854A (en) | 2018-07-20 | 2018-07-20 | Based on hidden expression and adaptive multiple view Subspace clustering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109002854A true CN109002854A (en) | 2018-12-14 |
Family
ID=64596652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810801776.1A Pending CN109002854A (en) | 2018-07-20 | 2018-07-20 | Based on hidden expression and adaptive multiple view Subspace clustering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109002854A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784374A (en) * | 2018-12-21 | 2019-05-21 | 西北工业大学 | Multi-angle of view clustering method based on adaptive neighbor point |
CN109978006A (en) * | 2019-02-25 | 2019-07-05 | 北京邮电大学 | Clustering method and device |
CN109993214A (en) * | 2019-03-08 | 2019-07-09 | 华南理工大学 | Multiple view clustering method based on Laplace regularization and order constraint |
CN110135520A (en) * | 2019-05-27 | 2019-08-16 | 哈尔滨工业大学(深圳) | Incomplete multi-angle of view clustering method, device, system and storage medium based on figure completion and adaptive visual angle weight distribution |
CN110543916A (en) * | 2019-09-06 | 2019-12-06 | 天津大学 | Method and system for classifying missing multi-view data |
CN111401468A (en) * | 2020-03-26 | 2020-07-10 | 上海海事大学 | Weight self-updating multi-view spectral clustering method based on shared neighbor |
CN111461178A (en) * | 2020-03-11 | 2020-07-28 | 深圳大学 | Data processing method, system and device |
CN112035626A (en) * | 2020-07-06 | 2020-12-04 | 北海淇诚信息科技有限公司 | Rapid identification method and device for large-scale intentions and electronic equipment |
CN112148911A (en) * | 2020-08-19 | 2020-12-29 | 江苏大学 | Image clustering method of multi-view intrinsic low-rank structure |
CN113139556A (en) * | 2021-04-22 | 2021-07-20 | 扬州大学 | Manifold multi-view image clustering method and system based on self-adaptive composition |
CN113159213A (en) * | 2021-04-30 | 2021-07-23 | 中国工商银行股份有限公司 | Service distribution method, device and equipment |
CN113239983A (en) * | 2021-04-25 | 2021-08-10 | 浙江师范大学 | Missing multi-view subspace clustering method and system based on high-order association preservation |
CN113269203A (en) * | 2021-05-17 | 2021-08-17 | 电子科技大学 | Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition |
CN113569973A (en) * | 2021-08-04 | 2021-10-29 | 咪咕文化科技有限公司 | Multi-view clustering method and device, electronic equipment and computer readable storage medium |
WO2022267954A1 (en) * | 2021-06-24 | 2022-12-29 | 浙江师范大学 | Spectral clustering method and system based on unified anchor and subspace learning |
WO2022267956A1 (en) * | 2021-06-24 | 2022-12-29 | 浙江师范大学 | Multi-view clustering method and system based on matrix decomposition and multi-partition alignment |
-
2018
- 2018-07-20 CN CN201810801776.1A patent/CN109002854A/en active Pending
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784374A (en) * | 2018-12-21 | 2019-05-21 | 西北工业大学 | Multi-angle of view clustering method based on adaptive neighbor point |
CN109978006B (en) * | 2019-02-25 | 2021-02-19 | 北京邮电大学 | Face image clustering method and device |
CN109978006A (en) * | 2019-02-25 | 2019-07-05 | 北京邮电大学 | Clustering method and device |
CN109993214A (en) * | 2019-03-08 | 2019-07-09 | 华南理工大学 | Multiple view clustering method based on Laplace regularization and order constraint |
CN109993214B (en) * | 2019-03-08 | 2021-06-08 | 华南理工大学 | Multi-view clustering method based on Laplace regularization and rank constraint |
CN110135520A (en) * | 2019-05-27 | 2019-08-16 | 哈尔滨工业大学(深圳) | Incomplete multi-angle of view clustering method, device, system and storage medium based on figure completion and adaptive visual angle weight distribution |
CN110543916A (en) * | 2019-09-06 | 2019-12-06 | 天津大学 | Method and system for classifying missing multi-view data |
CN111461178A (en) * | 2020-03-11 | 2020-07-28 | 深圳大学 | Data processing method, system and device |
CN111461178B (en) * | 2020-03-11 | 2023-03-28 | 深圳大学 | Data processing method, system and device |
CN111401468A (en) * | 2020-03-26 | 2020-07-10 | 上海海事大学 | Weight self-updating multi-view spectral clustering method based on shared neighbor |
CN111401468B (en) * | 2020-03-26 | 2023-03-24 | 上海海事大学 | Weight self-updating multi-view spectral clustering method based on shared neighbor |
CN112035626A (en) * | 2020-07-06 | 2020-12-04 | 北海淇诚信息科技有限公司 | Rapid identification method and device for large-scale intentions and electronic equipment |
CN112148911A (en) * | 2020-08-19 | 2020-12-29 | 江苏大学 | Image clustering method of multi-view intrinsic low-rank structure |
CN112148911B (en) * | 2020-08-19 | 2024-03-19 | 江苏大学 | Image clustering method of multi-view intrinsic low-rank structure |
CN113139556A (en) * | 2021-04-22 | 2021-07-20 | 扬州大学 | Manifold multi-view image clustering method and system based on self-adaptive composition |
CN113139556B (en) * | 2021-04-22 | 2023-06-23 | 扬州大学 | Manifold multi-view image clustering method and system based on self-adaptive composition |
CN113239983A (en) * | 2021-04-25 | 2021-08-10 | 浙江师范大学 | Missing multi-view subspace clustering method and system based on high-order association preservation |
CN113159213A (en) * | 2021-04-30 | 2021-07-23 | 中国工商银行股份有限公司 | Service distribution method, device and equipment |
CN113269203B (en) * | 2021-05-17 | 2022-03-25 | 电子科技大学 | Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition |
CN113269203A (en) * | 2021-05-17 | 2021-08-17 | 电子科技大学 | Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition |
WO2022267956A1 (en) * | 2021-06-24 | 2022-12-29 | 浙江师范大学 | Multi-view clustering method and system based on matrix decomposition and multi-partition alignment |
WO2022267954A1 (en) * | 2021-06-24 | 2022-12-29 | 浙江师范大学 | Spectral clustering method and system based on unified anchor and subspace learning |
CN113569973A (en) * | 2021-08-04 | 2021-10-29 | 咪咕文化科技有限公司 | Multi-view clustering method and device, electronic equipment and computer readable storage medium |
CN113569973B (en) * | 2021-08-04 | 2024-04-19 | 咪咕文化科技有限公司 | Multi-view clustering method, device, electronic equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109002854A (en) | Based on hidden expression and adaptive multiple view Subspace clustering method | |
Seddik et al. | Random matrix theory proves that deep learning representations of gan-data behave as gaussian mixtures | |
CN108776812A (en) | Multiple view clustering method based on Non-negative Matrix Factorization and various-consistency | |
Fowlkes et al. | Spectral grouping using the nystrom method | |
CN107292341B (en) | self-adaptive multi-view clustering method based on pair-wise collaborative regularization and NMF | |
Titsias et al. | Spike and slab variational inference for multi-task and multiple kernel learning | |
CN109522956B (en) | Low-rank discriminant feature subspace learning method | |
CN108171279B (en) | Multi-view video adaptive product Grassmann manifold subspace clustering method | |
CN109063757A (en) | It is diagonally indicated based on block and the multifarious multiple view Subspace clustering method of view | |
CN107563442B (en) | Hyperspectral image classification method based on sparse low-rank regular graph tensor embedding | |
CN105740912B (en) | The recognition methods and system of low-rank image characteristics extraction based on nuclear norm regularization | |
CN110222213B (en) | Image classification method based on heterogeneous tensor decomposition | |
CN108415883B (en) | Convex non-negative matrix factorization method based on subspace clustering | |
CN108021930B (en) | Self-adaptive multi-view image classification method and system | |
CN106650744B (en) | The image object of local shape migration guidance is divided into segmentation method | |
CN112488205A (en) | Neural network image classification and identification method based on optimized KPCA algorithm | |
CN109543723B (en) | Robust image clustering method | |
CN112990265A (en) | Post-fusion multi-view clustering machine learning method and system based on bipartite graph | |
CN111324791B (en) | Multi-view data subspace clustering method | |
CN112861929B (en) | Image classification method based on semi-supervised weighted migration discriminant analysis | |
CN106886793B (en) | Hyperspectral image waveband selection method based on discrimination information and manifold information | |
CN111340106A (en) | Unsupervised multi-view feature selection method based on graph learning and view weight learning | |
CN109657611A (en) | A kind of adaptive figure regularization non-negative matrix factorization method for recognition of face | |
CN111611323A (en) | Data fusion-oriented iterative structured multi-view subspace clustering method, device and readable storage medium | |
Gogebakan | A novel approach for Gaussian mixture model clustering based on soft computing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181214 |
|
WD01 | Invention patent application deemed withdrawn after publication |