CN115795333A - Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning - Google Patents

Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning Download PDF

Info

Publication number
CN115795333A
CN115795333A CN202211558793.XA CN202211558793A CN115795333A CN 115795333 A CN115795333 A CN 115795333A CN 202211558793 A CN202211558793 A CN 202211558793A CN 115795333 A CN115795333 A CN 115795333A
Authority
CN
China
Prior art keywords
matrix
representation
clustering
view
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211558793.XA
Other languages
Chinese (zh)
Inventor
杜世强
张凯武
石玉清
刘宝锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest Minzu University
Original Assignee
Northwest Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest Minzu University filed Critical Northwest Minzu University
Priority to CN202211558793.XA priority Critical patent/CN115795333A/en
Publication of CN115795333A publication Critical patent/CN115795333A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning, which comprises the following steps of: s1, aiming at a data matrix of each view, introducing a distance regularization term and a non-negative constraint based on low-rank representation into an LRR model to learn a graph with a global data structure and a local data structure; s2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism; and S3, optimizing the objective function of the consistency representation by using an efficient iterative update algorithm based on a multiplier Alternating Direction (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm. The invention can realize better clustering effect and obtain the picture with better quality, which is obviously superior to the prior method.

Description

Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning.
Background
Today, the amount of data is increasing, and how to extract useful information from a large amount of data is a focus of attention. Clustering is one of the most important and basic tools for multivariate data analysis, and has been widely used in image processing, recommendation systems, bioinformatics, and other research fields. Clustering divides a data set into different classes or clusters according to a certain specific standard (such as distance), so that the similarity of data objects in the same cluster is as large as possible, and the difference of data objects which are not in the same cluster is also as large as possible, so as to mine information such as inherent hidden structures in data. Therefore, the data can be divided according to the similarity so as to obtain a more accurate clustering result. If interpreted from a machine learning perspective, clustering is an unsupervised learning method that can perform clustering and the like on data with unknown tag information, thereby extracting useful information.
Most of the existing clustering methods are proposed based on single-view data, but in the practical application of big data analysis, data come from different sources, and the data are called multi-view data. The expected effect cannot be obtained from only a single view description data, so that the problem of multi-view data clustering becomes the focus of research of scholars. Over the past few decades, many advanced multi-view clustering algorithms have been studied. With the continuous development of multi-view clustering research, more and more multi-view clustering algorithms are proposed, wherein the multi-view clustering algorithm based on graphs and the multi-view clustering algorithm based on subspace learning gradually become the focus of attention due to the excellent performance of the multi-view clustering algorithms. The goal of the graph-based approach is to learn a consistent similarity matrix between the multi-view data, and then use the similarity matrix to obtain the final clustering result through a spectral clustering algorithm. The graph learning based method has made a good progress, but researchers find that the original data may not have obvious clustering features in the original space, but it may have a significant effect of projecting it to another space, so that the subspace learning based method is continuously emerging. The subspace learning-based algorithm targets different views of the multi-view data, and aims to learn a uniform representation from multiple subspaces or potential spaces, so that high-dimensional data can be processed more easily during clustering.
However, in practical applications, due to the difficulty, high cost and large data volume of data acquisition, it is almost impossible to have a complete multi-view data set. In general, the loss of some instance objects in a view in a multiview dataset is often referred to as incomplete multiview data. Due to the serious lack of information between views in incomplete data, the consistency and complementarity between views cannot be fully utilized as usual, which results in that the above-mentioned conventional multi-view method is no longer suitable for incomplete data and does not obtain satisfactory results in incomplete data. Therefore, in order to solve the problem of incomplete data clustering, incomplete multiview clustering has been proposed in recent years. These methods generally fall into two categories: incomplete multi-view clustering based on matrix factorization and incomplete multi-view clustering based on graphs. The incomplete multi-view clustering algorithm based on matrix decomposition aims to directly obtain low-dimensional consistent representation of all views by utilizing a matrix decomposition technology. Compared with a matrix decomposition-based method, the incomplete multi-view clustering method based on the graph can better describe the relationship between data points and more intuitively explore the original geometric structure of data.
Although there are many ways to deal with the incomplete clustering problem, there are still many problems to be solved. For example, these methods have the following limitations: the existing algorithm is focused on learning consistency representation, but does not consider local data structure, or fully utilizes the local structure of the data, but cannot learn global consistency representation; the filling method introduces noise. Specifically, removing bad parts not only does not improve performance, but also affects the original complete data; many methods can only handle two incomplete data and cannot handle incomplete multiple view data.
Disclosure of Invention
The invention aims to provide an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning, and aims to solve the problems in the prior art in the background technology.
The invention is realized in this way, a low rank constraint adaptive graph learning-based incomplete multi-view clustering method, which comprises the following steps:
s1, aiming at a data matrix of each view, introducing a distance regularization term and a non-negative constraint based on low-rank representation in an LRR model to learn a graph with a global data structure and a local data structure;
s2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism;
and S3, optimizing the objective function of the consistency representation by using an efficient iterative update algorithm based on a multiplier Alternating Direction (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm.
Preferably, in step S1, the distance regularization term and the non-negatively constrained LRR model based on the low rank representation are introduced as:
Figure BDA0003984219760000031
wherein the content of the first and second substances,
Figure BDA0003984219760000032
is a data matrix having n samples, wherein each sample is represented by a column vector;
Z (v) ∈R n×n is a representation matrix to be obtained, each element z ij Representing samples x in a joint representation j Relative to x i Is represented by (a);
each row of the matrix P can be considered as a new representation of the corresponding original sample, the new representation P being divided into several clusters by the K-means algorithm;
E (v) is the reconstruction error, L is the laplace matrix, and the calculation formula is: l is (v) =D (v) -W (v) ,W (v) Is defined as a similarity matrix
Figure BDA0003984219760000033
D is the diagonal matrix and its ith diagonal element is defined as:
Figure BDA0003984219760000034
||·|| * 、||·|| 1 and | · | non-counting 2 Nuclear norm, l, of the representation matrix 1 Norm sum l 2 Norm, tr (·) denotes the trace of the matrix;
diag(Z (v) ) =0 representing the matrix Z (v) All diagonal elements of (a) are 0 and 1, which represents column vectors of which all elements are 1, and I is a unit matrix; lambda [ alpha ] 1 And λ 2 Is a penalty parameter, uniformly represents the dimension of the matrix by using an index matrix G,
Figure BDA0003984219760000041
wherein the index matrix G is defined as:
Figure BDA0003984219760000042
preferably, in step S2, the mathematical model for learning the consistent representation of all views is:
Figure BDA0003984219760000043
wherein λ is 3 A penalty parameter. P * Is a target clustering index matrix to be learned;
ω v Ω(P (v) ,P * ) Is to measure P of each view * And P (v) The regularization term of consistency therebetween is defined as follows:
Figure BDA0003984219760000044
in the mathematical model for learning a consistent representation of all views, linear kernels are used
Figure BDA0003984219760000045
Weight ω v The importance of view v is measured by:
Figure BDA0003984219760000046
the constants in the model are omitted, and the final objective function is:
Figure BDA0003984219760000047
preferably, the step S3 specifically includes: introduction of several approximations Z (v) The problem is separable, and the lagrangian function of the final objective function obtained by the method is as follows:
Figure BDA0003984219760000051
wherein, J (v) And V (v) Is an approximation matrix Z (v) The variable of (a) is selected,
Figure BDA0003984219760000052
representation matrix V (v) A laplacian matrix of;
matrix of
Figure BDA0003984219760000053
Sum vector
Figure BDA0003984219760000054
Representing the Lagrange multiplier, mu representing a penalty parameter, | · | | non-calculation F A Frobenius norm representing a matrix;
and respectively and alternately updating different variables through fixed variables, learning the consistent representation of all views, and obtaining a final clustering result by using a K-means algorithm on the consistent representation.
Preferably, in step S3, Z is updated (v) While fixing other variables, the lagrangian function defined above can be converted to solve the following problem:
Figure BDA0003984219760000055
solving for Z (v) Is set to 0, variable Z (v) The optimal solution of (c) can be as follows:
Figure BDA0003984219760000056
wherein
Figure BDA0003984219760000057
Preferably, in step S3, J is updated (v) While, fixing other variables, calculating variable J (v) The sub-problem of (a) can be simplified as:
Figure BDA0003984219760000058
updating J using Singular Value Threshold (SVT) shrink operator (v)
Figure BDA0003984219760000061
Wherein Θ Representing the SVT shrink operator.
Preferably, in step S3, V is updated (v) And
Figure BDA0003984219760000062
while, fixing other variables, calculating variable V (v) The sub-problem of (2) translates to solving the following problem:
Figure BDA0003984219760000063
wherein N is (v) And M (v) Defining as N for auxiliary variables (v) =G (v) P (v)
Figure BDA0003984219760000064
For the
Figure BDA0003984219760000065
Variable V (v) The sub-problem of (a) translates into the following equivalence optimization problem:
Figure BDA0003984219760000066
wherein the content of the first and second substances,
Figure BDA0003984219760000067
Figure BDA0003984219760000068
and
Figure BDA0003984219760000069
representation matrix V (v) The ith and jth row vectors of (a);
by optimizing the problem pairs V equivalently (v) Derivative of (D) to obtain V (v) The optimal solution of (2):
Figure BDA00039842197600000610
then increase V (v) =max(V (v) 0) ensuring the matrix V (v) Are all greater than 0, and are,
Figure BDA00039842197600000611
the update optimization of (c) is as follows:
Figure BDA00039842197600000612
preferably, in step S3, E is updated (v) While fixing other variables, the lagrangian function defined above can be converted to solve the following problem:
Figure BDA0003984219760000071
for the sparse constraint optimization problem, a closed-form solution is obtained:
Figure BDA0003984219760000072
where θ denotes the shrink operator.
Preferably, in step S3, P is updated (v) While fixing other variables, the defined Lagrangian function can be transformed to solve the following problem:
Figure BDA0003984219760000073
solving the problem by eigenvalue decomposition, where P (v) By corresponding to the matrix
Figure BDA0003984219760000074
The top k feature vectors of the top k minimum feature values of (1);
in step S3, P is updated * While fixing other variables, the lagrangian function defined above can be converted to solve the following problem:
Figure BDA0003984219760000075
solving the problem by eigenvalue decomposition, where P * By corresponding to the matrix
Figure BDA0003984219760000076
The top k eigenvectors of the top k minimum eigenvalues of (1).
Preferably, in step S3, the update is performed
Figure BDA0003984219760000077
And μ, the other variables are fixed, and the four variables are furtherThe new method comprises the following steps:
Figure BDA0003984219760000078
Figure BDA0003984219760000079
Figure BDA00039842197600000710
μ=min(ρμ,μ 0 )
where p and μ 0 Is a constant.
Compared with the defects and shortcomings of the prior art, the invention has the following beneficial effects:
(1) The invention introduces a distance regularization term for simultaneously capturing the global and local structure of the data, and combines the distance regularization term with low-rank representation. Therefore, incomplete data information can be better utilized, and the global and local structures of the data can be known at the same time, so that a graph with better quality is obtained, and a better clustering effect is realized;
(2) In order to avoid the influence of bad views on the quality of the final fusion consistency graph, the invention provides a new weighting mechanism which can adaptively learn the proper weights of different views, so that the invention is more favorable for exploring the compact representation of incomplete data, reducing the influence of the bad views and further improving the clustering performance;
(3) The model is optimized through an efficient iterative updating algorithm based on a multiplier alternating direction method (ADMM), and a large number of experimental results on six types of incomplete multi-view data sets show that the method is obviously superior to the existing method.
Drawings
FIG. 1 is a flow diagram of an incomplete multi-view clustering method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention discloses an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning, wherein a flow frame diagram of the method is shown in figure 1, and the method comprises the following steps:
s1, introducing a distance regularization term and a non-negative constraint based on low rank representation to learn a graph with global and local data structures in an LRR model for a data matrix of each view
1. Adaptive graph regularization low rank representation learning
Filling missing parts with the average values of the corresponding samples may introduce noise, especially in the case of large-scale data loss. Therefore, the graph-based method is adopted to perform representation learning by using information of the existing sample. However, the existing graph-based learning method cannot obtain the inherent local structure of data, so that the relationship between original data instances cannot be fully utilized, and therefore, distance constraint and non-negativity constraint are introduced to ensure the locality and sparsity. To process incomplete multi-view data, an incomplete instance is removed from the dataset and defined as the dataset
Figure BDA0003984219760000091
Since the missing samples are different in each view in the multi-view dataset, learning the representation of the corresponding available samples in each view, introducing distance constraints in the LRR model to ensure locality, and using non-negative constraints to avoid unwanted solutions, the mathematical model is described as:
Figure BDA0003984219760000092
wherein Z (v) ∈R n×n Is a representation matrix, laplace matrix, to be obtained
Figure BDA0003984219760000093
Figure BDA0003984219760000094
Is defined as a similarity matrix
Figure BDA0003984219760000095
This results in a representation Z due to the deletion of incomplete instances in the dataset (v) Will not be consistent, so the index matrix G is used to uniformly represent the dimensions of the matrix,
Figure BDA0003984219760000096
wherein the index matrix G is defined as:
Figure BDA0003984219760000097
in practical applications, data may be corrupted by noise to different degrees, so the following error terms can be introduced to simulate noise, and equation (1) can be easily converted into:
Figure BDA0003984219760000098
wherein E (v) Is the reconstruction error, λ 1 And λ 2 Is a penalty parameter.
S2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism
The multi-view clustering algorithm based on spectral clustering learns the clustering index matrix of each sample and fuses the clustering index matrix into an optimal clustering index matrix,
Figure BDA0003984219760000101
can be equivalent to
Figure BDA0003984219760000102
Regularization weights for target clustering index matrix PIs the sum of all similarity weights of multiple graphs, but when the data is incomplete, the similarity graph is lost, so that in the incomplete case, the direct use of this method in the multi-view data may result in performance degradation. Therefore, the cluster index matrix for each view is first computed, and then the cluster index matrices for all views are used to learn the consistency representation:
Figure BDA0003984219760000103
wherein ω is v Ω(P (v) ,P * ) Is used to measure P and P of each view (v) The regularization term of consistency therebetween is defined as follows:
Figure BDA0003984219760000104
for the purpose of simplifying the problem, linear kernels are used here
Figure BDA0003984219760000105
Thus, P and P can be obtained. Weight ω v The importance of view v is represented by:
Figure BDA0003984219760000106
therefore, equation (6) can be changed to:
Figure BDA0003984219760000107
since the constant term can be omitted, the final objective function is as follows:
Figure BDA0003984219760000108
s3, optimizing the objective function of the consistency representation by using an efficient iterative update algorithm based on a multiplier Alternating Direction (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm
Since it is difficult to directly calculate the objective function, an alternating direction multiplier (ADMM) is used to calculate a locally optimal solution of the objective function. Introduction of several approximations Z (v) The problem is separable, the lagrange function of the objective function is as follows:
Figure BDA0003984219760000111
(1) Updating Z (v) Fixing other variables, the problem translates to solving the following problem:
Figure BDA0003984219760000112
then solve for Z (v) Is set to 0, variable Z (v) The optimal solution of (c) can be as follows:
Figure BDA0003984219760000113
wherein
Figure BDA0003984219760000114
(2) Update J (v) Fixing other variables, calculating variable J (v) The sub-problem of (a) can be simplified as:
Figure BDA0003984219760000115
updating J using Singular Value Threshold (SVT) shrink operator (v)
Figure BDA0003984219760000116
Where Θ represents the SVT shrink operator.
(3) Update V (v) And
Figure BDA0003984219760000121
fixing other variables, the problem translates to solving the following problem:
Figure BDA0003984219760000122
using an auxiliary variable N (v) =G (v) P (v)
Figure BDA0003984219760000123
For
Figure BDA0003984219760000124
The above formula can be translated into the following optimization problem:
Figure BDA0003984219760000125
wherein
Figure BDA0003984219760000126
By taking the above formula to V (v) Can be given as the derivative of V (v) The optimal solution of (a):
Figure BDA0003984219760000127
then increase V (v) =max(V (v) 0) ensuring the matrix V (v) Are all greater than 0, and are,
Figure BDA0003984219760000128
the update optimization of (c) is as follows:
Figure BDA0003984219760000129
(4) Update E (v) Fixing other variables, translates to solving the following problems:
Figure BDA00039842197600001210
for sparse constraint optimization problems, its closed-form solution can be easily obtained:
Figure BDA00039842197600001211
where θ represents the shrink operator.
(5) Updating P (v) Fixing other variables, translates to solving the following problems:
Figure BDA00039842197600001212
solving the problem by eigenvalue decomposition, where P (v) By corresponding to the matrix
Figure BDA0003984219760000131
The top k eigenvectors of the top k minimum eigenvalues of (1).
(6) Updating P * Fixing other variables, translates to solving the following problems:
Figure BDA0003984219760000132
solving the problem by eigenvalue decomposition, where P * By corresponding to the matrix
Figure BDA0003984219760000133
The top k minimum eigenvalue of (2).
(7) Updating
Figure BDA0003984219760000134
And μ, the other variables are fixed, and the four variables are updated as follows:
Figure BDA0003984219760000135
Figure BDA0003984219760000136
Figure BDA0003984219760000137
μ=min(ρμ,μ 0 ) (25)
the whole process of solving equation (8) is shown in algorithm 1:
Figure BDA0003984219760000138
Figure BDA0003984219760000141
in order to verify the correctness of the clustering result, the clustering experiment is respectively carried out on a 3source text data set, a BBCport text data set, a 100Leaves image data set, a Webkb data set, an ORL face data set and an Mfeat data set, 30%, 50%, 70% and 90% of views in the sample data set are randomly selected as matching samples for the three data sets of 100Leaves, mfeat and ORL, and one of the views is randomly deleted from the rest samples. The Accuracy rates (Accuracy) corresponding to the clustering results under different pairing rates on the 100Leaves data set are respectively 60.76%, 67.75%, 76.54% and 82.80%; normalized mutual information (Normalized mutual information) is: 78.78%, 82.32%, 87.43% and 90.66%; the purities (Purity) were respectively: 63.28%, 70.00%, 78.55% and 84.20%. The Accuracy rates (Accuracy) corresponding to the clustering results under different pairing rates on the Mfeat data set are respectively 80.87%, 87.95%, 90.30% and 93.22%; normalized mutual information (Normalized mutual information) is: 75.31%, 82.12%, 85.06% and 87.51%; the purities (Purity) were respectively: 80.87%, 87.95%, 90.30% and 93.22%. The Accuracy (Accuracy) corresponding to the clustering result under different pairing rates on the ORL data set is respectively 75.40%, 75.90%, 76.60% and 79.50%; normalized mutual information (Normalized mutual information) is: 86.79%, 87.21%, 87.67% and 88.94%; the purities (Purity) were respectively: 78.03%, 79.10%, 79.65% and 81.58%. On the BBCport, 3sources and Webkb data sets, random deletion is performed on the whole data set, so that all samples lose views, and an incomplete multi-view data set is constructed, wherein the deletion rates are 10%, 30% and 50% respectively. The Accuracy (Accuracy) corresponding to the clustering result under different deficiency rates on the BBCport data set is respectively 80.17%, 83.27% and 77.58%; normalized mutual information (Normalized mutual information) is respectively: 76.96%, 73.56% and 65.71%; the Purity (Purity) was: 91.37%, 87.58% and 85.34%. The Accuracy (Accuracy) corresponding to the clustering result under different deficiency rates on the 3sources data set is 82.84%, 81.45% and 76.54% respectively; normalized mutual information (Normalized mutual information) is respectively: 70.09%, 65.32% and 82.32%; the Purity (Purity) was: 85.91%, 81.45% and 63.28%. The Accuracy rates (accuray) corresponding to the clustering results under different deletion rates on the Webkb data set are respectively 89.34%, 88.86% and 80.02%; normalized mutual information (Normalized mutual information) is: 43.96%, 39.47% and 10.90%; the purities (Purity) were respectively: 89.34%, 88.86% and 80.02%. Compared with 7 latest multi-view clustering methods BSV, MIC, DAIMC, UEAF, IMSCAGL, GIMC-FLSD and IMSR, the method provided by the invention obtains the highest clustering result under 6 experimental databases and 3 common clustering evaluation criteria.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (10)

1. An incomplete multi-view clustering method based on low-rank constraint adaptive graph learning is characterized by comprising the following steps of:
s1, aiming at a data matrix of each view, introducing a distance regularization term and a non-negative constraint based on low-rank representation into an LRR model to learn a graph with a global data structure and a local data structure;
s2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism;
and S3, optimizing the objective function of the consistency representation by using an efficient iterative updating algorithm based on a multiplier alternating direction method (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm.
2. The incomplete multi-view clustering method of claim 1, wherein in step S1, the distance regularization term and the low rank representation based non-negatively constrained LRR model are introduced as:
Figure FDA0003984219750000011
s.t.X (v) =X (v) Z (v) ,diag(Z (v) )=0,Z (v) ≥0,Z (v)T 1=1,P T P=I
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003984219750000012
is a data matrix having n samples, wherein each sample is represented by a column vector;
Z (v) ∈R n×n is a representation matrix to be obtained, each element z ij Representing samples x in a joint representation j Relative to x i Is represented by (a);
each row of the matrix P can be considered as a new representation of the corresponding original sample, the new representation P being divided into several clusters by the K-means algorithm;
E (v) is the reconstruction error, L is the laplace matrix, and the calculation formula is: l is (v) =D (v) -W (v) ,W (v) Is defined as a similarity matrix
Figure FDA0003984219750000013
D is the diagonal matrix and its ith diagonal element is defined as:
Figure FDA0003984219750000021
||·|| * 、||·|| 1 and | · | non-conducting phosphor 2 Nuclear norm, l, of the representation matrix 1 Norm sum l 2 Norm, tr (·) denotes the trace of the matrix;
diag(Z (v) ) =0 representing the matrix Z (v) All diagonal elements of (1) are 0 and 1, which represents column vectors with all elements being 1, and I is a unit matrix; lambda [ alpha ] 1 And λ 2 Is a penalty parameter, uniformly represents the dimension of the matrix by using an index matrix G,
Figure FDA0003984219750000022
wherein the index matrix G is defined as:
Figure FDA0003984219750000023
3. the incomplete multi-view clustering method according to claim 2, wherein in step S2, the mathematical model for learning the consistent representation of all views is:
Figure FDA0003984219750000024
s.t.X (v) =X (v) Z (v) +E (v) ,diag(Z (v) )=0,Z (v) ≥0,Z (v)T 1=1,P T P=I,P *T P * =I
wherein λ is 3 A penalty parameter. P is * Is a target clustering index matrix to be learned;
ω v Ω(P (v) ,P * ) Is to measure P of each view * And P (v) The consistency regularization term between is defined as follows:
Figure FDA0003984219750000025
in the mathematical model for learning a consistent representation of all views, a linear kernel K is used P(v) =P (v) P (v)T ,
Figure FDA0003984219750000026
Weight ω v The importance of view v is measured by:
Figure FDA0003984219750000027
the constants in the model are omitted, and the final objective function is:
Figure FDA0003984219750000031
s.t.X (v) =X (v) Z (v) +E (v) ,diag(Z (v) )=0,Z (v) ≥0,Z (v)T 1=1,P T P=I,P *T P * =I。
4. the incomplete multi-view clustering method of claim 3, wherein the step S3 is specifically: introduction of several approximations Z (v) The problem is separable, the lagrange function of the final objective function obtained in claim 3 is:
Figure FDA0003984219750000032
wherein, J (v) And V (v) Is an approximation matrix Z (v) The variable of (a) is selected,
Figure FDA0003984219750000033
representation matrix V (v) A laplacian matrix of;
matrix array
Figure FDA0003984219750000034
Sum vector
Figure FDA0003984219750000035
Representing the Lagrange multiplier, mu representing a penalty parameter, | · | | non-calculation F A Frobenius norm representing the matrix;
and respectively and alternately updating different variables through fixed variables, learning the consistent representation of all views, and obtaining a final clustering result on the consistent representation by using a K-means algorithm.
5. The incomplete multi-view clustering method of claim 4, wherein in step S3, Z is updated (v) While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:
Figure FDA0003984219750000036
solving for Z (v) Is set to 0, variable Z (v) The optimal solution of (c) can be as follows:
Figure FDA0003984219750000041
wherein
Figure FDA0003984219750000042
6. The incomplete multi-view clustering method of claim 4, wherein in step S3, J is updated (v) While, fixing other variables, calculating variable J (v) The sub-problem of (c) can be simplified as:
Figure FDA0003984219750000043
updating J using Singular Value Threshold (SVT) shrink operator (v)
Figure FDA0003984219750000044
Wherein Θ Representing the SVT shrink operator.
7. The incomplete multi-view clustering method of claim 4, wherein in step S3, V is updated (v) And
Figure FDA0003984219750000045
while, fixing other variables, calculating variable V (v) The sub-problem of (2) translates to solving the following problem:
Figure FDA0003984219750000046
wherein, N (v) And M (v) For auxiliary variables to be defined as N (v) =G (v) P (v)
Figure FDA0003984219750000047
For the
Figure FDA0003984219750000048
Variable V (v) The sub-problem of (a) translates into the following equivalence optimization problem:
Figure FDA0003984219750000049
wherein the content of the first and second substances,
Figure FDA0003984219750000051
Figure FDA0003984219750000052
and
Figure FDA0003984219750000053
representation matrix V (v) The ith and jth row vectors of (a);
by optimizing problem pairs V equivalently (v) Derivative of (D) to obtain V (v) The optimal solution of (2):
Figure FDA0003984219750000054
then increase V (v) =max(V (v) 0) ensuring the matrix V (v) Are all greater than 0, and are,
Figure FDA0003984219750000055
the update optimization of (c) is as follows:
Figure FDA0003984219750000056
8. the incomplete multi-view clustering method of claim 4, wherein in step S3, E is updated (v) While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:
Figure FDA0003984219750000057
for the sparse constraint optimization problem, a closed-form solution is obtained:
Figure FDA0003984219750000058
where θ represents the shrink operator.
9. The incomplete multiview clustering method of claim 4, wherein in step S3, P is updated (v) While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:
Figure FDA0003984219750000059
solving the problem by eigenvalue decomposition, where P (v) By corresponding to the matrix
Figure FDA00039842197500000510
The front k eigenvectors of the front k minimum eigenvalues;
in step S3, P is updated * While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:
Figure FDA0003984219750000061
solving the problem by eigenvalue decomposition, where P * By corresponding to the matrix
Figure FDA0003984219750000062
The top k eigenvectors of the top k minimum eigenvalues of (1).
10. The incomplete multi-view clustering method of claim 4, wherein in step S3, the update is made
Figure FDA0003984219750000063
And μ, the other variables are fixed, and the four variables are updated as follows:
Figure FDA0003984219750000064
Figure FDA0003984219750000065
Figure FDA0003984219750000066
μ=min(ρμ,μ 0 )
where ρ and μ 0 Is a constant.
CN202211558793.XA 2022-12-06 2022-12-06 Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning Pending CN115795333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211558793.XA CN115795333A (en) 2022-12-06 2022-12-06 Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211558793.XA CN115795333A (en) 2022-12-06 2022-12-06 Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning

Publications (1)

Publication Number Publication Date
CN115795333A true CN115795333A (en) 2023-03-14

Family

ID=85418800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211558793.XA Pending CN115795333A (en) 2022-12-06 2022-12-06 Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning

Country Status (1)

Country Link
CN (1) CN115795333A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557821A (en) * 2024-01-11 2024-02-13 兰州大学 Semi-supervised subspace clustering method and device based on soft MFA

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557821A (en) * 2024-01-11 2024-02-13 兰州大学 Semi-supervised subspace clustering method and device based on soft MFA

Similar Documents

Publication Publication Date Title
Xie et al. Hyper-Laplacian regularized multilinear multiview self-representations for clustering and semisupervised learning
CN107292341B (en) self-adaptive multi-view clustering method based on pair-wise collaborative regularization and NMF
Wen et al. Consensus guided incomplete multi-view spectral clustering
CN112116017A (en) Data dimension reduction method based on kernel maintenance
Xing et al. Graph regularized nonnegative matrix factorization with label discrimination for data clustering
CN110348287A (en) A kind of unsupervised feature selection approach and device based on dictionary and sample similar diagram
Xing et al. Discriminative semi-supervised non-negative matrix factorization for data clustering
CN112784782A (en) Three-dimensional object identification method based on multi-view double-attention network
CN115795333A (en) Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning
Liu et al. Multi-view subspace clustering based on tensor schatten-p norm
Hao et al. Tensor-based multi-view clustering with consistency exploration and diversity regularization
CN111680579A (en) Remote sensing image classification method for adaptive weight multi-view metric learning
CN114898167A (en) Multi-view subspace clustering method and system based on inter-view difference detection
Wang et al. Projected fuzzy C-means with probabilistic neighbors
CN108121964B (en) Matrix-based joint sparse local preserving projection face recognition method
Zhang et al. Center consistency guided multi-view embedding anchor learning for large-scale graph clustering
You et al. Robust structure low-rank representation in latent space
CN111310807B (en) Feature subspace and affinity matrix joint learning method based on heterogeneous feature joint self-expression
Sun et al. Learning isometry-invariant representations for point cloud analysis
CN115392350A (en) Incomplete multi-view clustering method and system based on co-regularization spectral clustering
CN114005044A (en) Hyperspectral image anomaly detection method based on superpixels and progressive low-rank representation
Lu et al. Multi-view subspace clustering with consistent and view-specific latent factors and coefficient matrices
CN111461257B (en) Sharing-difference representation and clustering method for multi-view video in manifold space
Huang et al. Dimensionality reduction of tensors based on manifold-regularized tucker decomposition and its iterative solution
Zhang et al. Efficient subspace clustering based on self-representation and grouping effect

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination