CN115795333A

CN115795333A - Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning

Info

Publication number: CN115795333A
Application number: CN202211558793.XA
Authority: CN
Inventors: 杜世强; 张凯武; 石玉清; 刘宝锴
Original assignee: Northwest Minzu University
Current assignee: Northwest Minzu University
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-03-14

Abstract

The invention discloses an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning, which comprises the following steps of: s1, aiming at a data matrix of each view, introducing a distance regularization term and a non-negative constraint based on low-rank representation into an LRR model to learn a graph with a global data structure and a local data structure; s2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism; and S3, optimizing the objective function of the consistency representation by using an efficient iterative update algorithm based on a multiplier Alternating Direction (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm. The invention can realize better clustering effect and obtain the picture with better quality, which is obviously superior to the prior method.

Description

Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning

Technical Field

The invention belongs to the technical field of data analysis, and particularly relates to an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning.

Background

Today, the amount of data is increasing, and how to extract useful information from a large amount of data is a focus of attention. Clustering is one of the most important and basic tools for multivariate data analysis, and has been widely used in image processing, recommendation systems, bioinformatics, and other research fields. Clustering divides a data set into different classes or clusters according to a certain specific standard (such as distance), so that the similarity of data objects in the same cluster is as large as possible, and the difference of data objects which are not in the same cluster is also as large as possible, so as to mine information such as inherent hidden structures in data. Therefore, the data can be divided according to the similarity so as to obtain a more accurate clustering result. If interpreted from a machine learning perspective, clustering is an unsupervised learning method that can perform clustering and the like on data with unknown tag information, thereby extracting useful information.

Most of the existing clustering methods are proposed based on single-view data, but in the practical application of big data analysis, data come from different sources, and the data are called multi-view data. The expected effect cannot be obtained from only a single view description data, so that the problem of multi-view data clustering becomes the focus of research of scholars. Over the past few decades, many advanced multi-view clustering algorithms have been studied. With the continuous development of multi-view clustering research, more and more multi-view clustering algorithms are proposed, wherein the multi-view clustering algorithm based on graphs and the multi-view clustering algorithm based on subspace learning gradually become the focus of attention due to the excellent performance of the multi-view clustering algorithms. The goal of the graph-based approach is to learn a consistent similarity matrix between the multi-view data, and then use the similarity matrix to obtain the final clustering result through a spectral clustering algorithm. The graph learning based method has made a good progress, but researchers find that the original data may not have obvious clustering features in the original space, but it may have a significant effect of projecting it to another space, so that the subspace learning based method is continuously emerging. The subspace learning-based algorithm targets different views of the multi-view data, and aims to learn a uniform representation from multiple subspaces or potential spaces, so that high-dimensional data can be processed more easily during clustering.

However, in practical applications, due to the difficulty, high cost and large data volume of data acquisition, it is almost impossible to have a complete multi-view data set. In general, the loss of some instance objects in a view in a multiview dataset is often referred to as incomplete multiview data. Due to the serious lack of information between views in incomplete data, the consistency and complementarity between views cannot be fully utilized as usual, which results in that the above-mentioned conventional multi-view method is no longer suitable for incomplete data and does not obtain satisfactory results in incomplete data. Therefore, in order to solve the problem of incomplete data clustering, incomplete multiview clustering has been proposed in recent years. These methods generally fall into two categories: incomplete multi-view clustering based on matrix factorization and incomplete multi-view clustering based on graphs. The incomplete multi-view clustering algorithm based on matrix decomposition aims to directly obtain low-dimensional consistent representation of all views by utilizing a matrix decomposition technology. Compared with a matrix decomposition-based method, the incomplete multi-view clustering method based on the graph can better describe the relationship between data points and more intuitively explore the original geometric structure of data.

Although there are many ways to deal with the incomplete clustering problem, there are still many problems to be solved. For example, these methods have the following limitations: the existing algorithm is focused on learning consistency representation, but does not consider local data structure, or fully utilizes the local structure of the data, but cannot learn global consistency representation; the filling method introduces noise. Specifically, removing bad parts not only does not improve performance, but also affects the original complete data; many methods can only handle two incomplete data and cannot handle incomplete multiple view data.

Disclosure of Invention

The invention aims to provide an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning, and aims to solve the problems in the prior art in the background technology.

The invention is realized in this way, a low rank constraint adaptive graph learning-based incomplete multi-view clustering method, which comprises the following steps:

s1, aiming at a data matrix of each view, introducing a distance regularization term and a non-negative constraint based on low-rank representation in an LRR model to learn a graph with a global data structure and a local data structure;

s2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism;

and S3, optimizing the objective function of the consistency representation by using an efficient iterative update algorithm based on a multiplier Alternating Direction (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm.

Preferably, in step S1, the distance regularization term and the non-negatively constrained LRR model based on the low rank representation are introduced as:

wherein the content of the first and second substances,

is a data matrix having n samples, wherein each sample is represented by a column vector;

Z ^(v) ∈R ^n×n is a representation matrix to be obtained, each element z _ij Representing samples x in a joint representation _j Relative to x _i Is represented by (a);

each row of the matrix P can be considered as a new representation of the corresponding original sample, the new representation P being divided into several clusters by the K-means algorithm;

E ^(v) is the reconstruction error, L is the laplace matrix, and the calculation formula is: l is ^(v) ＝D ^(v) -W ^(v) ，W ^(v) Is defined as a similarity matrix

D is the diagonal matrix and its ith diagonal element is defined as:

||·|| _* 、||·|| ₁ and | · | non-counting ₂ Nuclear norm, l, of the representation matrix ₁ Norm sum l ₂ Norm, tr (·) denotes the trace of the matrix;

diag(Z ^(v) ) =0 representing the matrix Z ^(v) All diagonal elements of (a) are 0 and 1, which represents column vectors of which all elements are 1, and I is a unit matrix; lambda [ alpha ] ₁ And λ ₂ Is a penalty parameter, uniformly represents the dimension of the matrix by using an index matrix G,

wherein the index matrix G is defined as:

preferably, in step S2, the mathematical model for learning the consistent representation of all views is:

wherein λ is ₃ A penalty parameter. P ^* Is a target clustering index matrix to be learned;

ω _v Ω(P ^(v) ,P ^* ) Is to measure P of each view ^* And P ^(v) The regularization term of consistency therebetween is defined as follows:

in the mathematical model for learning a consistent representation of all views, linear kernels are used

Weight ω _v The importance of view v is measured by:

the constants in the model are omitted, and the final objective function is:

preferably, the step S3 specifically includes: introduction of several approximations Z ^(v) The problem is separable, and the lagrangian function of the final objective function obtained by the method is as follows:

wherein, J ^(v) And V ^(v) Is an approximation matrix Z ^(v) The variable of (a) is selected,

representation matrix V ^(v) A laplacian matrix of;

matrix of

Sum vector

Representing the Lagrange multiplier, mu representing a penalty parameter, | · | | non-calculation _F A Frobenius norm representing a matrix;

and respectively and alternately updating different variables through fixed variables, learning the consistent representation of all views, and obtaining a final clustering result by using a K-means algorithm on the consistent representation.

Preferably, in step S3, Z is updated ^(v) While fixing other variables, the lagrangian function defined above can be converted to solve the following problem:

solving for Z ^(v) Is set to 0, variable Z ^(v) The optimal solution of (c) can be as follows:

wherein

Preferably, in step S3, J is updated ^(v) While, fixing other variables, calculating variable J ^(v) The sub-problem of (a) can be simplified as:

updating J using Singular Value Threshold (SVT) shrink operator ^(v) ：

Wherein ^Θ Representing the SVT shrink operator.

Preferably, in step S3, V is updated ^(v) And

while, fixing other variables, calculating variable V ^(v) The sub-problem of (2) translates to solving the following problem:

wherein N is ^(v) And M ^(v) Defining as N for auxiliary variables ^(v) ＝G ^(v) P ^(v) ，

For the

Variable V ^(v) The sub-problem of (a) translates into the following equivalence optimization problem:

wherein the content of the first and second substances,

and

representation matrix V ^(v) The ith and jth row vectors of (a);

by optimizing the problem pairs V equivalently ^(v) Derivative of (D) to obtain V ^(v) The optimal solution of (2):

then increase V ^(v) ＝max(V ^(v) 0) ensuring the matrix V ^(v) Are all greater than 0, and are,

the update optimization of (c) is as follows:

preferably, in step S3, E is updated ^(v) While fixing other variables, the lagrangian function defined above can be converted to solve the following problem:

for the sparse constraint optimization problem, a closed-form solution is obtained:

where θ denotes the shrink operator.

Preferably, in step S3, P is updated ^(v) While fixing other variables, the defined Lagrangian function can be transformed to solve the following problem:

solving the problem by eigenvalue decomposition, where P ^(v) By corresponding to the matrix

The top k feature vectors of the top k minimum feature values of (1);

in step S3, P is updated ^* While fixing other variables, the lagrangian function defined above can be converted to solve the following problem:

solving the problem by eigenvalue decomposition, where P ^* By corresponding to the matrix

The top k eigenvectors of the top k minimum eigenvalues of (1).

Preferably, in step S3, the update is performed

And μ, the other variables are fixed, and the four variables are furtherThe new method comprises the following steps:

μ＝min(ρμ,μ ₀ )

where p and μ ₀ Is a constant.

Compared with the defects and shortcomings of the prior art, the invention has the following beneficial effects:

(1) The invention introduces a distance regularization term for simultaneously capturing the global and local structure of the data, and combines the distance regularization term with low-rank representation. Therefore, incomplete data information can be better utilized, and the global and local structures of the data can be known at the same time, so that a graph with better quality is obtained, and a better clustering effect is realized;

(2) In order to avoid the influence of bad views on the quality of the final fusion consistency graph, the invention provides a new weighting mechanism which can adaptively learn the proper weights of different views, so that the invention is more favorable for exploring the compact representation of incomplete data, reducing the influence of the bad views and further improving the clustering performance;

(3) The model is optimized through an efficient iterative updating algorithm based on a multiplier alternating direction method (ADMM), and a large number of experimental results on six types of incomplete multi-view data sets show that the method is obviously superior to the existing method.

Drawings

FIG. 1 is a flow diagram of an incomplete multi-view clustering method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the invention discloses an incomplete multi-view clustering method based on low-rank constraint adaptive graph learning, wherein a flow frame diagram of the method is shown in figure 1, and the method comprises the following steps:

s1, introducing a distance regularization term and a non-negative constraint based on low rank representation to learn a graph with global and local data structures in an LRR model for a data matrix of each view

1. Adaptive graph regularization low rank representation learning

Filling missing parts with the average values of the corresponding samples may introduce noise, especially in the case of large-scale data loss. Therefore, the graph-based method is adopted to perform representation learning by using information of the existing sample. However, the existing graph-based learning method cannot obtain the inherent local structure of data, so that the relationship between original data instances cannot be fully utilized, and therefore, distance constraint and non-negativity constraint are introduced to ensure the locality and sparsity. To process incomplete multi-view data, an incomplete instance is removed from the dataset and defined as the dataset

Since the missing samples are different in each view in the multi-view dataset, learning the representation of the corresponding available samples in each view, introducing distance constraints in the LRR model to ensure locality, and using non-negative constraints to avoid unwanted solutions, the mathematical model is described as:

wherein Z ^(v) ∈R ^n×n Is a representation matrix, laplace matrix, to be obtained

Is defined as a similarity matrix

This results in a representation Z due to the deletion of incomplete instances in the dataset ^(v) Will not be consistent, so the index matrix G is used to uniformly represent the dimensions of the matrix,

wherein the index matrix G is defined as:

in practical applications, data may be corrupted by noise to different degrees, so the following error terms can be introduced to simulate noise, and equation (1) can be easily converted into:

wherein E ^(v) Is the reconstruction error, λ ₁ And λ ₂ Is a penalty parameter.

S2, obtaining a clustering index matrix of each view based on a multi-view clustering algorithm of spectral clustering, and learning consistency representation of all views by using the clustering index matrix through a weighting fusion mechanism

The multi-view clustering algorithm based on spectral clustering learns the clustering index matrix of each sample and fuses the clustering index matrix into an optimal clustering index matrix,

can be equivalent to

Regularization weights for target clustering index matrix PIs the sum of all similarity weights of multiple graphs, but when the data is incomplete, the similarity graph is lost, so that in the incomplete case, the direct use of this method in the multi-view data may result in performance degradation. Therefore, the cluster index matrix for each view is first computed, and then the cluster index matrices for all views are used to learn the consistency representation:

wherein ω is _v Ω(P ^(v) ,P ^* ) Is used to measure P and P of each view ^(v) The regularization term of consistency therebetween is defined as follows:

for the purpose of simplifying the problem, linear kernels are used here

Thus, P and P can be obtained. Weight ω _v The importance of view v is represented by:

therefore, equation (6) can be changed to:

since the constant term can be omitted, the final objective function is as follows:

s3, optimizing the objective function of the consistency representation by using an efficient iterative update algorithm based on a multiplier Alternating Direction (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm

Since it is difficult to directly calculate the objective function, an alternating direction multiplier (ADMM) is used to calculate a locally optimal solution of the objective function. Introduction of several approximations Z ^(v) The problem is separable, the lagrange function of the objective function is as follows:

(1) Updating Z ^(v) Fixing other variables, the problem translates to solving the following problem:

then solve for Z ^(v) Is set to 0, variable Z ^(v) The optimal solution of (c) can be as follows:

wherein

(2) Update J ^(v) Fixing other variables, calculating variable J ^(v) The sub-problem of (a) can be simplified as:

updating J using Singular Value Threshold (SVT) shrink operator ^(v) ：

Where Θ represents the SVT shrink operator.

(3) Update V ^(v) And

fixing other variables, the problem translates to solving the following problem:

using an auxiliary variable N ^(v) ＝G ^(v) P ^(v) ，

For

The above formula can be translated into the following optimization problem:

wherein

By taking the above formula to V ^(v) Can be given as the derivative of V ^(v) The optimal solution of (a):

the update optimization of (c) is as follows:

(4) Update E ^(v) Fixing other variables, translates to solving the following problems:

for sparse constraint optimization problems, its closed-form solution can be easily obtained:

where θ represents the shrink operator.

(5) Updating P ^(v) Fixing other variables, translates to solving the following problems:

The top k eigenvectors of the top k minimum eigenvalues of (1).

(6) Updating P ^* Fixing other variables, translates to solving the following problems:

The top k minimum eigenvalue of (2).

(7) Updating

And μ, the other variables are fixed, and the four variables are updated as follows:

μ＝min(ρμ,μ ₀ ) (25)

the whole process of solving equation (8) is shown in algorithm 1:

in order to verify the correctness of the clustering result, the clustering experiment is respectively carried out on a 3source text data set, a BBCport text data set, a 100Leaves image data set, a Webkb data set, an ORL face data set and an Mfeat data set, 30%, 50%, 70% and 90% of views in the sample data set are randomly selected as matching samples for the three data sets of 100Leaves, mfeat and ORL, and one of the views is randomly deleted from the rest samples. The Accuracy rates (Accuracy) corresponding to the clustering results under different pairing rates on the 100Leaves data set are respectively 60.76%, 67.75%, 76.54% and 82.80%; normalized mutual information (Normalized mutual information) is: 78.78%, 82.32%, 87.43% and 90.66%; the purities (Purity) were respectively: 63.28%, 70.00%, 78.55% and 84.20%. The Accuracy rates (Accuracy) corresponding to the clustering results under different pairing rates on the Mfeat data set are respectively 80.87%, 87.95%, 90.30% and 93.22%; normalized mutual information (Normalized mutual information) is: 75.31%, 82.12%, 85.06% and 87.51%; the purities (Purity) were respectively: 80.87%, 87.95%, 90.30% and 93.22%. The Accuracy (Accuracy) corresponding to the clustering result under different pairing rates on the ORL data set is respectively 75.40%, 75.90%, 76.60% and 79.50%; normalized mutual information (Normalized mutual information) is: 86.79%, 87.21%, 87.67% and 88.94%; the purities (Purity) were respectively: 78.03%, 79.10%, 79.65% and 81.58%. On the BBCport, 3sources and Webkb data sets, random deletion is performed on the whole data set, so that all samples lose views, and an incomplete multi-view data set is constructed, wherein the deletion rates are 10%, 30% and 50% respectively. The Accuracy (Accuracy) corresponding to the clustering result under different deficiency rates on the BBCport data set is respectively 80.17%, 83.27% and 77.58%; normalized mutual information (Normalized mutual information) is respectively: 76.96%, 73.56% and 65.71%; the Purity (Purity) was: 91.37%, 87.58% and 85.34%. The Accuracy (Accuracy) corresponding to the clustering result under different deficiency rates on the 3sources data set is 82.84%, 81.45% and 76.54% respectively; normalized mutual information (Normalized mutual information) is respectively: 70.09%, 65.32% and 82.32%; the Purity (Purity) was: 85.91%, 81.45% and 63.28%. The Accuracy rates (accuray) corresponding to the clustering results under different deletion rates on the Webkb data set are respectively 89.34%, 88.86% and 80.02%; normalized mutual information (Normalized mutual information) is: 43.96%, 39.47% and 10.90%; the purities (Purity) were respectively: 89.34%, 88.86% and 80.02%. Compared with 7 latest multi-view clustering methods BSV, MIC, DAIMC, UEAF, IMSCAGL, GIMC-FLSD and IMSR, the method provided by the invention obtains the highest clustering result under 6 experimental databases and 3 common clustering evaluation criteria.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. An incomplete multi-view clustering method based on low-rank constraint adaptive graph learning is characterized by comprising the following steps of:

s1, aiming at a data matrix of each view, introducing a distance regularization term and a non-negative constraint based on low-rank representation into an LRR model to learn a graph with a global data structure and a local data structure;

and S3, optimizing the objective function of the consistency representation by using an efficient iterative updating algorithm based on a multiplier alternating direction method (ADMM), and obtaining a final clustering result on the consistency representation by using a K-means algorithm.

2. The incomplete multi-view clustering method of claim 1, wherein in step S1, the distance regularization term and the low rank representation based non-negatively constrained LRR model are introduced as:

s.t.X ^(v) ＝X ^(v) Z ^(v) ,diag(Z ^(v) )＝0,Z ^(v) ≥0,Z ^(v)T 1＝1,P ^T P＝I

wherein, the first and the second end of the pipe are connected with each other,

D is the diagonal matrix and its ith diagonal element is defined as:

||·|| _* 、||·|| ₁ and | · | non-conducting phosphor ₂ Nuclear norm, l, of the representation matrix ₁ Norm sum l ₂ Norm, tr (·) denotes the trace of the matrix;

diag(Z ^(v) ) =0 representing the matrix Z ^(v) All diagonal elements of (1) are 0 and 1, which represents column vectors with all elements being 1, and I is a unit matrix; lambda [ alpha ] ₁ And λ ₂ Is a penalty parameter, uniformly represents the dimension of the matrix by using an index matrix G,

wherein the index matrix G is defined as:

3. the incomplete multi-view clustering method according to claim 2, wherein in step S2, the mathematical model for learning the consistent representation of all views is:

s.t.X ^(v) ＝X ^(v) Z ^(v) +E ^(v) ,diag(Z ^(v) )＝0,Z ^(v) ≥0,Z ^(v)T 1＝1,P ^T P＝I,P ^*T P ^* ＝I

wherein λ is ₃ A penalty parameter. P is ^* Is a target clustering index matrix to be learned;

ω _v Ω(P ^(v) ,P ^* ) Is to measure P of each view ^* And P ^(v) The consistency regularization term between is defined as follows:

in the mathematical model for learning a consistent representation of all views, a linear kernel K is used _P(v) ＝P ^(v) P ^(v)T ,

Weight ω _v The importance of view v is measured by:

the constants in the model are omitted, and the final objective function is:

s.t.X ^(v) ＝X ^(v) Z ^(v) +E ^(v) ,diag(Z ^(v) )＝0,Z ^(v) ≥0,Z ^(v)T 1＝1,P ^T P＝I,P ^*T P ^* ＝I。

4. the incomplete multi-view clustering method of claim 3, wherein the step S3 is specifically: introduction of several approximations Z ^(v) The problem is separable, the lagrange function of the final objective function obtained in claim 3 is:

representation matrix V ^(v) A laplacian matrix of;

matrix array

Sum vector

Representing the Lagrange multiplier, mu representing a penalty parameter, | · | | non-calculation _F A Frobenius norm representing the matrix;

and respectively and alternately updating different variables through fixed variables, learning the consistent representation of all views, and obtaining a final clustering result on the consistent representation by using a K-means algorithm.

5. The incomplete multi-view clustering method of claim 4, wherein in step S3, Z is updated ^(v) While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:

wherein

6. The incomplete multi-view clustering method of claim 4, wherein in step S3, J is updated ^(v) While, fixing other variables, calculating variable J ^(v) The sub-problem of (c) can be simplified as:

updating J using Singular Value Threshold (SVT) shrink operator ^(v) ：

Wherein ^Θ Representing the SVT shrink operator.

7. The incomplete multi-view clustering method of claim 4, wherein in step S3, V is updated ^(v) And

wherein, N ^(v) And M ^(v) For auxiliary variables to be defined as N ^(v) ＝G ^(v) P ^(v) ，

For the

wherein the content of the first and second substances,

and

representation matrix V ^(v) The ith and jth row vectors of (a);

by optimizing problem pairs V equivalently ^(v) Derivative of (D) to obtain V ^(v) The optimal solution of (2):

the update optimization of (c) is as follows:

8. the incomplete multi-view clustering method of claim 4, wherein in step S3, E is updated ^(v) While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:

where θ represents the shrink operator.

9. The incomplete multiview clustering method of claim 4, wherein in step S3, P is updated ^(v) While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:

The front k eigenvectors of the front k minimum eigenvalues;

in step S3, P is updated ^* While fixing other variables, the lagrangian function defined in claim 4 can be converted to solve the following problem:

The top k eigenvectors of the top k minimum eigenvalues of (1).

10. The incomplete multi-view clustering method of claim 4, wherein in step S3, the update is made

μ＝min(ρμ,μ ₀ )

where ρ and μ ₀ Is a constant.