CN115311483A - Incomplete multi-view clustering method and system based on local structure and balance perception - Google Patents

Incomplete multi-view clustering method and system based on local structure and balance perception Download PDF

Info

Publication number
CN115311483A
CN115311483A CN202210979979.6A CN202210979979A CN115311483A CN 115311483 A CN115311483 A CN 115311483A CN 202210979979 A CN202210979979 A CN 202210979979A CN 115311483 A CN115311483 A CN 115311483A
Authority
CN
China
Prior art keywords
view
matrix
clustering
missing
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210979979.6A
Other languages
Chinese (zh)
Inventor
文杰
刘成亮
刘毅成
邓世杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202210979979.6A priority Critical patent/CN115311483A/en
Publication of CN115311483A publication Critical patent/CN115311483A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation

Abstract

The invention discloses an incomplete multi-view clustering method and system based on local structure and balanced sensing, which comprises the steps of designing an incomplete multi-view consistent clustering characterization learning model with probability characteristics based on local structure and balanced sensing aiming at a clustering task of incomplete multi-view data; preprocessing incomplete multi-view data of a given view missing prior position index matrix; and designing variables to be solved based on an alternative iterative optimization method according to the preprocessed data and the variables contained in the incomplete multi-view consistent clustering characterization learning model, so as to achieve the purpose of model optimization, and obtaining clustering results of all samples by using an optimal shared consistent characterization matrix obtained after optimization. The model designed by the method is an incomplete multi-view clustering model with interpretability, high efficiency and stable clustering result.

Description

Incomplete multi-view clustering method and system based on local structure and balance perception
Technical Field
The application relates to the technical field of machine learning and pattern recognition, in particular to an incomplete multi-view clustering method and system based on local structure and balanced perception.
Background
In the last years, a large amount of Multi-view data collected by different sensors or collected in different ways has emerged from different fields or industries, and the need for Multi-view Clustering (MVC) has also arisen in various applications. For example, to predict the likely progression of alzheimer's disease, a multi-view clustering model is proposed that consistently characterizes learning, with each sample of the model being represented by two brain magnetic resonance imaging data; in addition, the multi-view clustering model based on Non-Negative Matrix Factorization (NMF) also obtains better effect in the recommendation of the webpage items. In general, the conventional MVC method is based on a perfect view assumption that all samples can fully observe their complete view characteristic information. When the traditional multi-view clustering method is adopted to process incomplete multi-view data clustering tasks, samples with missing views are removed in advance, and only the samples with complete views can be clustered. In fact, in many practical applications, such as recommendation systems and alzheimer diagnosis, the actual collection is often incomplete data lacking views. Therefore, incomplete multi-view clustering research has important significance, and the method has higher popularization value and application value than the traditional multi-view clustering method.
In recent years, scholars at home and abroad successively put forward some incomplete multi-view clustering models. For example, a typical correlation analysis strategy is introduced with incomplete two-view clustering, and a complete kernel matrix of one complete view can be used to complete missing information of another incomplete kernel matrix. An incomplete multi-core k-means method (IMKKM-MKC) with a core completion characteristic is also proposed to solve the incomplete multi-view clustering problem under the condition of any view deletion. In addition to the methods based on the nuclear completion, an incomplete multi-view clustering method based on the graph completion is used for designing a model from the perspective of graph learning, restoring missing information of a plurality of incomplete graph matrixes and generating a consistent representation of the views. A Unified Embedding Alignment Framework (UEAF) based on matrix factorization designs a joint optimization model that can simultaneously recover missing view information and learn consistent tokens. The methods generally adopt the idea of restoring missing information to solve the incomplete multi-view clustering problem.
Although the missing information completion method can solve the incomplete multi-view clustering problem under the condition of partial view information missing to some extent, the method still has the following defects: 1) Almost all methods divide the clustering task into two independent and unrelated stages, namely graph or characterization learning is firstly carried out, and then k-means or spectral clustering is carried out to obtain the final incomplete multi-view clustering result. On the one hand, these methods do not guarantee that the resulting graph or representation is a cluster-friendly representation that can achieve optimal clustering performance; on the other hand, because all the methods adopt k-means to generate a final clustering result, and k different clustering results are generated by operating the k-means for k times, the methods cannot obtain a stable and unique clustering solution. 2) These methods generally have high computational complexity and memory consumption, resulting in an unsuitability for processing the clustering task of "large-scale" incomplete multiview data.
As described above, in recent years, many incomplete multi-view clustering methods have been proposed to solve the challenging problem of multi-view data clustering in which a missing view exists. However, most of the existing methods are not suitable for large-scale incomplete multi-view data clustering tasks, and clustering performance is unstable.
Disclosure of Invention
Aiming at the problems, the invention provides an incomplete multi-view clustering method and system based on local structure and balanced perception, and designs an incomplete multi-view consistent clustering characterization learning model aiming at the efficient learning problem of incomplete multi-view data, wherein the model obtains a unique clustering result by learning consistent characterization with probability characteristics among views.
In a first aspect of the present invention, an incomplete multi-view clustering method based on local structure and balanced sensing includes the following steps:
establishing a model: aiming at a clustering task of incomplete multi-view data, designing an incomplete multi-view consistent clustering characterization learning model with probability characteristics based on local structure and balanced perception, wherein the model specifically comprises the following steps:
Figure BDA0003800049100000021
Figure BDA0003800049100000022
wherein the content of the first and second substances,
Figure BDA0003800049100000023
base matrix, m, representing the v-th view v Representing the feature dimension of the v-th view, d representing the dimension of the consistent token space, P ∈ R d×n A shared consistent characterization matrix representing incomplete multiview data, n represents a total number of samples of the incomplete multiview data, α = [ α ] 1 ,...,α l ]Is a learnable weight vector, 1 ∈ R d D-dimensional column vectors, alpha, representing element values all of 1 v Representing the v-th element in the vector alpha, r is a positive integer no less than 2,
Figure BDA0003800049100000024
representing the element alpha in the vector alpha v Is the power of r, λ is a penalty parameter, l represents the number of views, n v Represents the number of samples that are not missing in the v view, I is an identity matrix, I i,j A value of an element indicating an (i, j) -th row-column position of the identity matrix,
Figure BDA0003800049100000025
representing the similarity relationship between the ith sample and the vth view of the jth sample,
Figure BDA0003800049100000026
representation matrix X (v) The ith column of vectors of (a) is,
Figure BDA0003800049100000027
indicating that the v-th view does not lack the matrix set formed by the samples,
Figure BDA0003800049100000028
representation matrix G (v) The j-th column vector of (a),
Figure BDA0003800049100000029
is a binary matrix of 0 and 1;
data preprocessing: incomplete multi-view data for a given view missing a priori position index matrix Z
Figure BDA00038000491000000210
Carrying out pretreatment;
optimizing the model: according to the preprocessed data and the incomplete multi-view consistent clustering characterization learning model, aiming at the variables contained in the model
Figure BDA0003800049100000031
P, alpha, an introduced auxiliary variable Q, a Lagrange multiplier C and a positive penalty parameter mu, and designing a solution variable based on an alternative iterative optimization method to achieve the purpose of model optimization, wherein:
solving for U (v) The optimization problem of (2):
Figure BDA0003800049100000032
obtain the variable U (v) Is the optimal solution of U (v) =M (v) N (v)T Wherein M is (v)(v) N (v)T Is X (v) S (v)T G (v)T P T Singular value decomposition equivalent of, S (v) =W (v) +I,
Figure BDA0003800049100000033
A pre-constructed similarity graph matrix is obtained;
solving an optimization problem of P:
Figure BDA0003800049100000034
the optimal solution for the variable P is obtained as:
Figure BDA0003800049100000035
wherein
Figure BDA0003800049100000036
Figure BDA0003800049100000037
μ>0 is a positive penalty parameter, C is a Lagrange multiplier, Q is an auxiliary variable and P = Q, and C represents the row number of the matrix P;
solving the optimization problem of Q:
Figure BDA0003800049100000038
the optimal solution for the variable Q is obtained as: q = (μ P + C) (11) T +μI) -1
Solving the optimization problem of α:
Figure BDA0003800049100000039
the optimal solution for the variable α is obtained as:
Figure BDA00038000491000000310
wherein the content of the first and second substances,
Figure BDA00038000491000000311
the updated equations for C and μ are:
Figure BDA00038000491000000312
where p and μ 0 Is a constant;
and (3) clustering: obtaining a clustering result of data by using the optimized optimal shared consistent representation matrix P, which specifically comprises the following steps: according to
Figure BDA00038000491000000313
If the ith row P :,i When the jth element value is maximum, the ith sample is divided into the jth category, and the clustering results of all samples can be obtained by solving the position corresponding to the maximum element value of each column of the characterization matrix P.
Further, in the above-mentioned case,
Figure BDA0003800049100000041
is a binary matrix of 0 and 1, and is used for reserving the sum X in the matrix P (v) Corresponding sample characterization, matrix G (v) Constructing according to the view missing prior position index matrix Z, wherein the specific construction mode is as follows:
Figure BDA0003800049100000042
further, incomplete multi-view data of the a priori position index matrix Z is missing for a given view
Figure BDA0003800049100000043
The pretreatment is carried out, and the specific steps comprise:
deletion of missing views: deleting the missing samples in each view according to the view missing prior position index matrix Z to obtain the non-missing data set
Figure BDA0003800049100000044
Data normalization: to pair
Figure BDA0003800049100000045
Carrying out normalization pretreatment by the calculation mode of
Figure BDA0003800049100000046
Wherein
Figure BDA0003800049100000047
Representation matrix X (v) The ith column vector of (2);
local neighbor map
Figure BDA0003800049100000048
Construction: non-missing data X for each view (v) The distance between each sample and k nearest neighbor samples is calculated by Gaussian kernel in the way of
Figure BDA0003800049100000049
Wherein
Figure BDA00038000491000000410
As a sample
Figure BDA00038000491000000411
One of k neighbors, W (v) Other non-neighboring elements are set to 0;
constructing a conversion matrix according to the view missing prior position index matrix Z
Figure BDA00038000491000000412
In a second aspect of the present invention, an incomplete multi-view clustering system based on local structure and balance perception is provided, the system comprising:
the method comprises the steps of establishing a model unit, designing a consistent clustering characterization learning model of incomplete multi-view with probability characteristics based on local structure and balanced perception, and specifically comprising the following steps of:
Figure BDA00038000491000000413
Figure BDA00038000491000000414
wherein the content of the first and second substances,
Figure BDA00038000491000000415
base matrix, m, representing the v-th view v Representing the feature dimension of the v-th view, d representing the dimension of the consistent token space, P ∈ R d×n A shared consistent characterization matrix representing incomplete multiview data, n representing a total number of samples of the incomplete multiview data, α = [ α ] 1 ,...,α l ]Is a learnable weight vector, 1 ∈ R d Representing d-dimensional column vectors with element values of 1, lambda is a penalty parameter, l represents the number of views, and n v Represents the v-th viewNumber of samples not missing, I is the identity matrix, I i,j A value of an element indicating an (i, j) th row-column position of the identity matrix,
Figure BDA00038000491000000416
representing the similarity relationship between the ith sample and the vth view of the jth sample,
Figure BDA0003800049100000051
representation matrix X (v) The vector of the ith column of (a),
Figure BDA0003800049100000052
indicating that the v view does not lack the matrix set formed by the samples,
Figure BDA0003800049100000053
representation matrix G (v) The (j) th column vector of (a),
Figure BDA0003800049100000054
is a binary matrix of 0 and 1;
a data preprocessing unit for missing incomplete multi-view data of the a priori position index matrix Z for a given view
Figure BDA0003800049100000055
Carrying out pretreatment;
an optimization model unit used for characterizing a learning model according to the preprocessed data and the incomplete multi-view consistent clustering and aiming at the variables contained in the model
Figure BDA0003800049100000056
P, alpha, an introduced auxiliary variable Q, a Lagrange multiplier C and a positive penalty parameter mu, and designing a method based on alternating iterative optimization to solve variables so as to achieve the purpose of model optimization, wherein:
solving for U (v) The optimization problem of (2):
Figure BDA0003800049100000057
obtain the variable U (v) Is U as the optimal solution of (v) =M (v) N (v)T Wherein M is (v)(v) N (v)T Is X (v) S (v)T G (v)T P T Singular value decomposition of S (v) =W (v) +I,
Figure BDA0003800049100000058
Is a pre-constructed similarity graph matrix;
solving the optimization problem of P:
Figure BDA0003800049100000059
the optimal solution for the variable P is obtained as:
Figure BDA00038000491000000510
wherein
Figure BDA00038000491000000511
Figure BDA00038000491000000512
μ>0 is a positive penalty parameter, C is a lagrange multiplier, Q is an auxiliary variable and P = Q, C represents the number of rows of the matrix P;
solving the optimization problem of Q:
Figure BDA00038000491000000513
the optimal solution for the variable Q is obtained as: q = (μ P + C) (11) T +μI) -1
Solving the optimization problem of α:
Figure BDA00038000491000000514
the optimal solution for the variable α is obtained as:
Figure BDA00038000491000000515
wherein the content of the first and second substances,
Figure BDA00038000491000000516
the updated equations for C and μ are:
Figure BDA00038000491000000517
where p and μ 0 Is a constant;
the clustering unit is configured to obtain a clustering result of the data by using the optimized optimal shared consistent representation matrix P, and specifically includes: according to
Figure BDA0003800049100000061
If the ith row P :,i The jth element has the largest value, then the ith sample is classified into the jth class. And obtaining the clustering result of all samples by calculating the position corresponding to the maximum element value of each column of the characterization matrix P.
In a further aspect of the present invention,
Figure BDA0003800049100000062
is a binary matrix of 0 and 1, and is used for reserving the sum X in the matrix P (v) Corresponding sample characterization, matrix G (v) Constructing according to the view missing prior position index matrix Z, wherein the specific construction mode is as follows:
Figure BDA0003800049100000063
further, the data preprocessing unit specifically comprises the following steps:
deletion of missing views: deleting the missing samples in each view according to the view missing prior position index matrix Z to obtain a non-missing data set
Figure BDA0003800049100000064
Data normalization: to pair
Figure BDA0003800049100000065
Carrying out normalization pretreatment in a calculation mode of
Figure BDA0003800049100000066
Wherein
Figure BDA0003800049100000067
Representation matrix X (v) The ith column vector of (2);
local neighbor map
Figure BDA0003800049100000068
Construction: non-missing data X for each view (v) Calculating the distance between each sample and k nearest neighbor samples by using Gaussian kernel in the following calculation mode
Figure BDA0003800049100000069
Wherein
Figure BDA00038000491000000610
As a sample
Figure BDA00038000491000000611
One of k neighbors, W (v) Other non-neighbor elements are set to 0;
constructing a conversion matrix according to the view missing prior position index matrix Z
Figure BDA00038000491000000612
In a third aspect of the present invention, an incomplete multi-view clustering system based on local structure and balanced perception is provided, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the above-described method of partial structure and balance perception based incomplete multi-view clustering.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, on which instructions are stored, and when executed by a processor, the instructions cause the processor to perform the above incomplete multi-view clustering method based on local structure and balance perception.
The invention provides a method and a system for Clustering Incomplete multiview based on Local strUcture and Balance perception, aiming at the problem of Efficient learning of Incomplete multiview, the method and the system design an Incomplete multiview consistent Clustering characterization learning model with probability characteristics, the model obtains a unique Clustering result by learning consistent characterization with probability characteristics among views, wherein each element in a consistent probability characterization vector can directly reflect the probability that a corresponding sample belongs to a certain category. In addition, the model integrates the geometric structure maintenance and the consistent characterization learning into a very compact model, and any additional constraint term and penalty term parameter are not required to be introduced due to the introduction of the geometric structure maintenance characteristic, so that the model is more compact, and the parameter adjustment burden is reduced. Furthermore, to avoid over-partitioning of samples into a few classes, balanced perceptual learning techniques are introduced. The method not only has the best and most stable clustering performance, but also has higher calculation efficiency than the current advanced incomplete multi-view clustering method. Specifically, the beneficial effects of the invention include:
the LUBA _ EIMVC designs a novel balance perception graph regularization incomplete multi-view orthogonal matrix decomposition model, the model can not only mine and utilize local structure information of views to guide optimization of the model, but also can fully utilize non-missing view information to learn cluster consistency representation with probability characteristics;
different from the existing method for acquiring the clustering result by using k-means, the LUBA _ EIMVC directly acquires a unique positive probability matrix shared by all views, and each element in the matrix can be regarded as the probability that a sample belongs to a certain class, so that the problem of inaccurate clustering caused by the k-means can be solved;
in order to avoid the problem that the samples are excessively concentrated in certain classes or even a certain class in the process of optimizing the clustering result by the model, the balance perception constraint of a probability matrix is introduced, a consistent characterization matrix with clustering friendliness and probability characteristics is jointly learned, and the clustering result of incomplete multi-view data can be directly obtained on the basis of the matrix;
due to the learning of the consistent representation of the probability characteristics, the model designed by the LUBA _ EIMVC is an incomplete multi-view clustering model with interpretability, high efficiency and stable clustering result.
Drawings
FIG. 1 is a schematic diagram of an incomplete multi-view clustering method based on local structure and balanced sensing in an embodiment of the present invention;
FIG. 2 is a schematic diagram of an optimization learning process of an incomplete multi-view consistent clustering characterization learning model in the embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an incomplete multi-view clustering system based on local structure and balance perception in an embodiment of the present invention;
fig. 4 is an architecture diagram of a computer device in an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures associated with the present invention are shown in the drawings, not all of them.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The embodiment of the invention provides the following embodiments aiming at an incomplete multi-view clustering method and system based on local structure and balanced perception:
example 1 based on the invention
Fig. 1 shows a schematic diagram of an Incomplete Multi-View Clustering method based on Local strUcture and balanced perception in embodiment 1 of the present invention, which is a schematic diagram of an Incomplete Multi-View Clustering (LUBA _ EIMVC, local strUcture-and Balance-architecture efficiency inclusion Multi-View Clustering) method based on Local strUcture and balanced perception, and the method aims to obtain a positive probability consistent representation matrix with balanced Clustering results from Incomplete Multi-View data, so as to output Clustering results with interpretability and uniqueness. The method comprises the following specific steps:
establishing a model: designing a clustering task based on incomplete multi-view data, and designing an incomplete multi-view consistent clustering characterization learning model with probability characteristics based on local structure and balanced perception, wherein the clustering task specifically comprises the following steps:
Figure BDA0003800049100000081
Figure BDA0003800049100000082
wherein the content of the first and second substances,
Figure BDA0003800049100000083
base matrix representing the v-th view angle, m v Representing the characteristic dimension of the v-th view, d representing the dimension of the consistent token space, P ∈ R d×n Representing a shared consistent characterization matrix of the incomplete multiview data, n representing a total number of samples of the incomplete multiview data; α = [ α = 1 ,...,α l ]Is a learnable weight vector, 1 ∈ R d Representing d-dimensional column vectors, alpha, with element values all of 1 v Representing the v-th element in the vector alpha, r is a positive integer no less than 2,
Figure BDA0003800049100000084
representing the element alpha in the vector alpha v To the r-th power of; λ is a penalty parameter, l represents the number of views, n v Represents the number of samples that are not missing in the v view, I is an identity matrix, I i,j A value of an element indicating an (i, j) th row-column position of the identity matrix,
Figure BDA0003800049100000085
representing the similarity relationship between the ith sample and the vth view of the jth sample,
Figure BDA0003800049100000086
representation matrix X (v) The ith column of vectors of (a) is,
Figure BDA0003800049100000087
indicating that the v-th view does not lack the matrix set formed by the samples,
Figure BDA0003800049100000088
representation matrix G (v) The (j) th column vector of (a),
Figure BDA0003800049100000089
is a binary matrix of 0 and 1;
in the implementation, given incomplete multi-view data
Figure BDA00038000491000000810
And view missing position information matrix Z epsilon R l×n ,Y (v) A matrix set of n samples representing the v-th view, with column vectors
Figure BDA00038000491000000811
Representing the v-th view feature of the j-th sample. If the v view of the j sample is missing, the corresponding element Z in the view missing position information matrix Z v,j =0; otherwise Z v,j =1 indicates that the view of the corresponding sample is not missing. On the original data
Figure BDA00038000491000000812
In (2), all feature elements of the feature vector corresponding to the missing view can be identified by "NaN". For the clustering task of the incomplete multi-view data, the embodiment designs the learning model of the local structure and balance perception of the formula (1) and the incomplete multi-view consistent clustering representation with probability characteristicsIn the formula (1), matrix X (v) Can be deleted from the original data Y (v) The column corresponding to the vector denoted "NaN" in (b) is obtained directly. In the formula (1), the reaction mixture is,
Figure BDA0003800049100000091
a base matrix representing the v-th perspective, the values of which are obtained by optimizing the model (1), where d represents the dimension of the uniform characterization space, usually set to the number of classes into which the data is expected to be partitioned. 1 ≧ P ≧ 0 indicates that the range of all element values in matrix P is [0,1]。
Figure BDA0003800049100000092
For pre-constructed similarity graph matrix, its elements
Figure BDA0003800049100000093
The method is used for representing the similarity relation between the ith sample and the v view of the jth sample, and the specific construction mode is as follows: 1) And calculating the Gaussian distance between the non-missing samples in the v view in the following way:
Figure BDA0003800049100000094
2) For the ith non-missing sample, according to it and other n v Ordering the distances between 1 sample, at W (v) Only the gaussian distance corresponding to each sample and its first k minimum samples is kept, and the other elements are set to 0.
In a preferred embodiment, in formula (1)
Figure BDA0003800049100000095
Is a binary matrix of 0 and 1, and is used for reserving the sum X in the matrix P (v) Corresponding sample characterization, matrix G (v) Constructing according to the view missing prior position index matrix Z, wherein the specific construction mode is as follows:
Figure BDA0003800049100000096
in the model (1), the basis matrix U (v) Orthogonal constraint on U (v)T U (v) = I may avoid the problem of cluster center degradation. Constraint term
Figure BDA0003800049100000097
Is a balanced perceptual constraint that avoids the problem of over-clustering into a small number of classes, i.e., categorizing data with class c as
Figure BDA0003800049100000098
And (4) class. Compared with the binary clustering label obtained by kmeans, the constraint P introduced in the model (1) T 1=1 results in a consistent probability matrix, which increases the degree of freedom for basis matrix learning and consistent representation learning. In the case of the model (1),
Figure BDA0003800049100000099
the novel graph designed for the method is embedded with a multi-view consistent characterization learning item, and the novelty and the distinctiveness of the item from other methods are mainly reflected in that: the invention puts the graph embedded structure retention characteristics and the sharing consistent characterization learning of incomplete multiple views into a very simplified model, has no hyper-parameters, and can obtain the structured identification consistent characterization.
Data preprocessing: incomplete multi-view data for a given view missing a priori position index matrix Z
Figure BDA00038000491000000910
Carrying out pretreatment;
in a preferred embodiment, incomplete multi-view data of the a priori position index matrix Z is missing for a given view
Figure BDA00038000491000000911
The pretreatment is carried out, and the specific steps comprise:
deletion of missing views: deleting the missing samples in each view according to the view missing prior position index matrix Z to obtain a non-missing data set
Figure BDA00038000491000000912
Data normalization: to pair
Figure BDA0003800049100000101
Carrying out normalization pretreatment in a calculation mode of
Figure BDA0003800049100000102
Wherein
Figure BDA0003800049100000103
Representation matrix X (v) The ith column vector of (1);
local neighbor map
Figure BDA0003800049100000104
Construction: non-missing data X for each view (v) The distance between each sample and k nearest neighbor samples is calculated by Gaussian kernel in the way of
Figure BDA0003800049100000105
Wherein
Figure BDA0003800049100000106
As a sample
Figure BDA0003800049100000107
One of k neighbors, W (v) Other non-neighboring elements are set to 0;
constructing a conversion matrix according to the view missing prior position index matrix Z by using the formula (2)
Figure BDA0003800049100000108
Optimizing the model: according to the preprocessed data and the designed incomplete multi-view consistent clustering characterization learning model (1), aiming at the variables contained in the model
Figure BDA0003800049100000109
P, alpha, an introduced auxiliary variable Q, a Lagrange multiplier C and a positive penalty parameter mu, and solving variables by designing a method based on alternating iterative optimization to achieve the purpose of model optimization.
In the concrete implementation process, the model (1) contains
Figure BDA00038000491000001010
P, alpha and the like, and a method based on alternate iterative optimization is designed to solve the variables. First, let S (v) =W (v) + I and introduce an auxiliary variable Q and let P = Q as follows:
Figure BDA00038000491000001011
the augmented Lagrangian function of problem (3) can be expressed as:
Figure BDA00038000491000001012
in the formula (I), the compound is shown in the specification,
Figure BDA00038000491000001013
μ>0 is a positive penalty parameter; c is the lagrange multiplier.
Figure BDA00038000491000001014
Representation matrix A∈Rm×n The 'Frobenius' norm of (1) is calculated in a way of
Figure BDA00038000491000001015
Wherein A i,j Is the (i, j) th element of matrix a.
Then, the following five problems are optimized one by one through iterative solution, and the optimal solution of the variables can be obtained:
step 1: solving for U (v) When solving for U (v) Then, the other belt solution variables can be regarded as two known variables, and then the variable U is obtained (v) The following optimization sub-problem:
Figure BDA00038000491000001016
according to constraint U (v)T U (v) = I, the problem (5) can be simplified as:
Figure BDA0003800049100000111
in the formula D (v) Is a diagonal matrix because of the matrix S (v) Has a symmetrical structure, so D (v) Is calculated as
Figure BDA0003800049100000112
From equation (6), the following optimization problem equivalent to problem (5) can be obtained:
Figure BDA0003800049100000113
let X (v) S (v)T G (v)T P T Singular Value Decomposition (SVD) of (v)(v) N (v)T Then the optimal solution of problem (7) is U (v) =M (v) N (v)T The singular value decomposition operation can be obtained by directly calling the 'svd' function in matlab software.
Step 2: solving for P, considering variables other than P as known quantities, the following sub-optimization problem for the variable P can be obtained:
Figure BDA0003800049100000114
problem (8) can be simplified as:
Figure BDA0003800049100000115
in the formula
Figure BDA0003800049100000116
It can be found that matrix a is a diagonal matrix with the diagonal elements all being positive values.According to
Figure BDA0003800049100000117
The problem (9) can be transformed into the equivalent form:
Figure BDA0003800049100000118
problem (10) can be viewed as n independent optimization problems, so P can be optimized on a column-by-column basis :,i The optimal solution for the variable P is represented as follows:
Figure BDA0003800049100000119
in the formula, the function max (a, 0) indicates that an element a smaller than 0 is set to 0.c denotes the number of rows of the matrix P.
And step 3: solving Q, fixing other variables except Q, and then degenerating the optimization problem into:
Figure BDA0003800049100000121
by calculating the partial derivative of the problem (12) with respect to the variable Q and setting it to 0, one can obtain:
Q=(μP+C)(11 T +μI) -1 (13)
and 4, step 4: and solving alpha. Order to
Figure BDA0003800049100000122
And fixing the variable independent of the variable alpha, the following optimization problem about the variable alpha can be obtained:
Figure BDA0003800049100000123
solving the problem (14), the optimal solution of the available variable α is:
Figure BDA0003800049100000124
wherein r is a positive integer not less than 2.
And 5: c and μ are updated. The updated equations for C and μ are as follows:
Figure BDA0003800049100000125
where ρ and μ 0 Is a constant number of μ 0 Is usually set to a relatively large value such as 10 8 ρ is usually set to a value greater than 1.
One specific example of a model optimization procedure is shown in Algorithm 1 below, for LUBA _ EIMVC, the label of the ith sample can be passed
Figure BDA0003800049100000126
And (4) directly obtaining.
Figure BDA0003800049100000127
Figure BDA0003800049100000131
The complete optimization flowchart for the model (1) is shown in fig. 2, wherein the initialization step mainly includes the following steps in the algorithm 1: will be provided with
Figure BDA0003800049100000132
Initialized to a random arbitrary orthogonal matrix. Initialization α =1, μ 0 =10 8 μ =0.01, ρ =1.08. The convergence criterion in FIG. 2 is | loss t -loss t-1 |<10 -5 Wherein loss t And loss t-1 Respectively representing target loss values of the t step and the t-1 step, and the calculation formula is as follows:
Figure BDA0003800049100000133
and (3) clustering process: using the optimized optimal shared consistent characterization matrix P according to
Figure BDA0003800049100000134
Figure BDA0003800049100000135
If the ith row P :,i The jth element has the largest value, then the ith sample is classified into the jth class. The clustering result of all samples can be obtained by solving the position corresponding to the maximum element value of each column of the characterization matrix P, and d represents the row number of the matrix P and can be generally set as the clustering category number c.
Since the elements in the matrix P of the present invention represent the probability that each sample belongs to a certain class, it can be directly based on
Figure BDA0003800049100000136
To obtain the clustering result of the data, i.e. if the ith column P :,i The jth element has the largest value, then the ith sample is classified into the jth class. The clustering results of all samples can be obtained by solving the position corresponding to the maximum element value of each column of the matrix P.
The method designs a fast, stable and interpretable incomplete multi-view data clustering new model for the incomplete multi-view data clustering problem under view deficiency in application scenes of various industries, and compared with the previous model, the method has the following unique characteristics: the model is simple and distinctive: providing a distinctive and concise 'incomplete multi-view consistent characterization learning item embedded in a local structure', wherein the item integrates the local structure embedding and the incomplete multi-view consistent characterization learning into an optimization item; the model has interpretability: each constraint item of the model has meaning and value, and each obtained element value sharing consistent representation is the representation of the clustering result, so that the model and the output result of the model have interpretability; the model has a unique stable solution: different from the traditional method which needs additional kmeans clustering to obtain an unstable clustering result, the method can directly obtain a data unique clustering result according to the unique output 'consistent representation probability matrix P' of the model. The method of the invention not only can obtain higher clustering precision, but also has the advantages of least time expenditure and highest efficiency.
Example 2 based on the invention
The partially structured and balanced sensing-based incomplete multi-view clustering system 300 provided in embodiment 2 of the present invention can execute the partially structured and balanced sensing-based incomplete multi-view clustering method provided in any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method.
Fig. 3 is a schematic structural diagram of an incomplete multi-view clustering system 300 based on local structure and balanced sensing in embodiment 2 of the present invention. Referring to fig. 3, an incomplete multi-view clustering system 300 based on local structure and balanced sensing according to an embodiment of the present invention may specifically include:
the model establishing unit 310 is configured to design an incomplete multi-view consistent clustering characterization learning model with probability characteristics based on a local structure and balanced perception based on a clustering task of incomplete multi-view data, and specifically includes:
Figure BDA0003800049100000141
Figure BDA0003800049100000142
wherein the content of the first and second substances,
Figure BDA0003800049100000143
base matrix representing the v-th view angle, m v Representing the characteristic dimension of the v-th view, d representing the dimension of the consistent token space, P ∈ R d×n A shared consistent representation matrix representing incomplete multiview data, n representing incomplete multiviewTotal number of samples of the graph data, α = [ α = [ ] 1 ,...,α l ]Is a learnable weight vector, 1 ∈ R d Representing d-dimensional column vectors with element values of 1, lambda being a penalty parameter, l representing the number of views, n v Represents the number of samples that are not missing in the v view, I is an identity matrix, I i,j A value of an element indicating an (i, j) -th row-column position of the identity matrix,
Figure BDA0003800049100000144
representing the similarity relationship between the ith sample and the vth view of the jth sample,
Figure BDA0003800049100000145
representation matrix X (v) The ith column of vectors of (a) is,
Figure BDA0003800049100000146
indicating that the v-th view does not lack the matrix set formed by the samples,
Figure BDA0003800049100000147
representation matrix G (v) The (j) th column vector of (a),
Figure BDA0003800049100000148
is a binary matrix of 0 and 1; r is a positive integer not less than 2.
A data preprocessing unit 320 for missing incomplete multi-view data of the a priori position index matrix Z for a given view
Figure BDA0003800049100000149
Carrying out pretreatment;
an optimization model unit 330, configured to characterize a learning model according to the preprocessed data and the designed incomplete multi-view consistent cluster, and target to variables included in the model
Figure BDA0003800049100000151
P, alpha, introduced auxiliary variable Q, lagrange multiplier C and positive penalty parameter mu, designing a method based on alternative iterative optimization to solve the variables to reach a normThe purpose of type optimization, wherein:
solving for U (v) The optimization problem of (2):
Figure BDA0003800049100000152
obtain the variable U (v) Is U as the optimal solution of (v) =M (v) N (v)T Wherein M is (v)(v) N (v)T Is X (v) S (v)T G (v)T P T Singular value decomposition equivalent of, S (v) =W (v) +I,
Figure BDA0003800049100000153
A pre-constructed similarity graph matrix is obtained;
solving the optimization problem of P:
Figure BDA0003800049100000154
the optimal solution for the variable P is obtained as follows:
Figure BDA0003800049100000155
wherein
Figure BDA0003800049100000156
Figure BDA0003800049100000157
μ>0 is a positive penalty parameter, C is a lagrange multiplier, Q is an auxiliary variable and P = Q;
solving the optimization problem of Q:
Figure BDA0003800049100000158
the optimal solution for the variable Q is obtained as: q = (μ P + C) (11) T +μI) -1
Solving the optimization problem of α:
Figure BDA0003800049100000159
the optimal solution for the variable α is obtained as:
Figure BDA00038000491000001510
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00038000491000001511
the updated equations for C and μ are:
Figure BDA00038000491000001512
where p and μ 0 Is a constant;
a clustering unit 340, configured to obtain a clustering result of the data according to the optimized optimal shared consistent characterization matrix P
Figure BDA00038000491000001513
If the ith row P :,i The jth element has the largest value, then the ith sample is divided into the jth class. And obtaining the clustering result of all samples by calculating the position corresponding to the maximum element value of each column of the characterization matrix P.
Further, in the above-mentioned case,
Figure BDA00038000491000001514
is a binary matrix of 0 and 1, and is used for reserving the sum X in the matrix P (v) Corresponding sample characterization, matrix G (v) Constructing according to the view missing prior position index matrix Z, wherein the specific construction mode is as follows:
Figure BDA0003800049100000161
further, the data preprocessing unit 320 specifically includes:
deletion of missing view: deleting the missing samples in each view according to the view missing prior position index matrix Z to obtain a non-missing data set
Figure BDA0003800049100000162
Data normalization: to pair
Figure BDA0003800049100000163
Go on to returnA normalization pretreatment in the form of calculation
Figure BDA0003800049100000164
Wherein
Figure BDA0003800049100000165
Representation matrix X (v) The ith column vector of (2);
local neighbor map
Figure BDA0003800049100000166
Construction: non-missing data X for each view (v) Calculating the distance between each sample and k nearest neighbor samples by using Gaussian kernel in the following calculation mode
Figure BDA0003800049100000167
Wherein
Figure BDA0003800049100000168
As a sample
Figure BDA0003800049100000169
One of k neighbors, W (v) Other non-neighboring elements are set to 0;
constructing a conversion matrix according to the view missing prior position index matrix Z
Figure BDA00038000491000001610
The system 300 may include other components in addition to the 4 units described above, however, since these components are not related to the contents of the embodiments of the present disclosure, illustration and description thereof are omitted herein.
The specific working process of the incomplete multi-view clustering system 300 based on local structure and balanced sensing is described in reference to the above-mentioned embodiment 1 of the incomplete multi-view clustering method based on local structure and balanced sensing, and is not described again.
Example 3 based on the invention
A system according to an embodiment of the present invention may also be implemented by means of the architecture of a computing device as shown in fig. 4. Fig. 4 illustrates an architecture of the computing device. As shown in fig. 4, a computer system 410, a system bus 430, one or more CPUs 440, input/output components 420, memory 450, and the like. The memory 450 may store various data or files used in computer processing and/or communication and program instructions executed by the CPU including the method of embodiment 1. The architecture shown in fig. 4 is merely exemplary, and one or more of the components in fig. 4 may be adjusted as needed to implement different devices.
Example 4 based on the invention
Embodiments of the present invention may also be implemented as a computer-readable storage medium. The computer-readable storage medium according to embodiment 4 has computer-readable instructions stored thereon. The computer readable instructions, when executed by a processor, may perform the incomplete multi-view clustering method based on local structure and balance perception according to embodiment 1 of the present invention described with reference to the above drawings.
The embodiment of the invention aims at the incomplete multi-view clustering method, the incomplete multi-view clustering system and the storage medium based on the local structure and the balance perception, and utilizes the embodiments 1 to 4 to carry out training test on the incomplete multi-view clustering method and the incomplete multi-view clustering system based on the local structure and the balance perception. Table 2 shows the average clustering accuracy obtained on the BBCSport, caltech101 and 3Sources datasets with a view miss rate of 30%. Where MIC and DAIMC are imperfect multi-view clustering methods of current manifolds, the results are shown in table 2:
TABLE 2
Data set MIC DAIMC The invention
BBCSport 46.21±4.71 63.45±1.97 78.79±3.02
Caltech101 20.12±0.75 25.15±0.31 27.63±0.90
3Sources 47.69±7.61 52.43±6.63 71.83±7.37
Table 3 is the execution time (in seconds) on BBCSport, caltech101 and 3Sources datasets with a view miss rate of 30%, where MIC and DAIMC are the incomplete multi-view clustering methods of the current manifold, and the results are shown in table 3:
TABLE 3
Data set MIC DAIMC The invention
BBCSport 3.843 148.501 2.183
Caltech101 1.407×10 4 1.861×10 3 129.541
3Sources 5.912 563.780 4.967
By utilizing the embodiments 1 to 4 and the performance analysis, the invention provides an Incomplete Multi-View Clustering method and system based on Local strUcture and Balance perception, aiming at the Efficient learning problem of Incomplete Multi-View, the invention designs an Incomplete Multi-View consistent cluster characterization learning model with probability characteristics, which obtains a unique Clustering result by learning consistent characterization with probability characteristics among views, wherein each element in a consistent probability characterization vector can directly reflect the probability of a corresponding sample belonging to a certain category. In addition, the model integrates the geometric structure maintenance and the consistent characterization learning into a very concise model, and any additional constraint term and penalty term parameter are not required to be introduced due to the introduction of the geometric structure maintenance characteristic, so that the model is simpler, and the parameter adjustment burden is reduced. Furthermore, to avoid over-partitioning of samples into a few classes, balanced perceptual learning techniques are introduced. The method not only has the best and most stable clustering performance, but also has higher calculation efficiency compared with the current more advanced incomplete multi-view clustering method. Specifically, the beneficial effects of the invention include: the LUBA _ EIMVC designs a novel balance perception graph regularization incomplete multi-view orthogonal matrix decomposition model, the model can not only mine and utilize local structure information of views to guide optimization of the model, but also can fully utilize non-missing view information to learn cluster consistency representation with probability characteristics; different from the existing method for acquiring the clustering result by using k-means, the LUBA _ EIMVC directly acquires a unique positive probability matrix shared by all views, and each element in the matrix can be regarded as the probability that a sample belongs to a certain class, so that the problem of inaccuracy of the clustering result caused by the k-means can be solved; in order to avoid the problem that the samples are excessively concentrated in certain classes or even a certain class in the process of optimizing the clustering result by the model, the balance perception constraint of a probability matrix is introduced, a consistent characterization matrix with clustering friendliness and probability characteristics is jointly learned, and the clustering result of incomplete multi-view data can be directly obtained on the basis of the matrix; due to the learning of the consistent representation of the probability characteristics, the model designed by the LUBA _ EIMVC is an incomplete multi-view clustering model with interpretability, high efficiency and stable clustering results.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (8)

1. An incomplete multi-view clustering method based on local structure and balanced perception is characterized by comprising the following steps:
establishing a model: for the clustering task of incomplete multi-view data, designing an incomplete multi-view consistent clustering characterization learning model with probability characteristics based on local structure and balanced perception, wherein the model specifically comprises the following steps:
Figure FDA0003800049090000011
s.t.U (v)T U (v) =I,1≥α v ≥0,
Figure FDA0003800049090000012
1≥P≥0,P T 1=1
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003800049090000013
base matrix, m, representing the v-th view v Representing the characteristic dimension of the v-th view, d representing the dimension of the consistent token space, P ∈ R d×n A shared consistent characterization matrix representing incomplete multiview data, n representing a total number of samples of the incomplete multiview data, α = [ α ] 1 ,...,α l ]Is a learnable weight vector, 1 ∈ R d Representing d-dimensional column vectors, alpha, with element values all of 1 v Representing the v-th element in the vector alpha, r is a positive integer no less than 2,
Figure FDA0003800049090000014
representing the element alpha in the vector alpha v Is a penalty term parameter, l represents the number of views, n v Represents the number of samples that are not missing in the v view, I is an identity matrix, I i,j A value of an element indicating an (i, j) -th row-column position of the identity matrix,
Figure FDA0003800049090000015
representing the similarity relationship between the ith sample and the vth view of the jth sample,
Figure FDA0003800049090000016
representation matrix X (v) The ith column of vectors of (a) is,
Figure FDA0003800049090000017
indicating that the v-th view does not lack the matrix set formed by the samples,
Figure FDA0003800049090000018
representation matrix G (v) The (j) th column vector of (a),
Figure FDA0003800049090000019
is a binary matrix of 0 and 1;
data preprocessing: incomplete multi-view data for a given view missing a priori position index matrix Z
Figure FDA00038000490900000110
Carrying out pretreatment;
optimizing the model: according to the preprocessed data and the incomplete multi-view consistent clustering characterization learning model, aiming at the variables contained in the model
Figure FDA00038000490900000111
P, alpha, an introduced auxiliary variable Q, a Lagrange multiplier C and a positive penalty parameter mu, and designing a solution variable based on an alternative iterative optimization method to achieve the purpose of model optimization, wherein:
solving for U (v) The optimization problem of (2):
Figure FDA00038000490900000112
obtain the variable U (v) Is U as the optimal solution of (v) =M (v) N (v)T Wherein M is (v)(v) N (v)T Is X (v) S (v)T G (v)T P T Singular value decomposition equivalent of (1), S (v) =W (v) +I,
Figure FDA00038000490900000113
A pre-constructed similarity graph matrix is obtained;
solving the optimization problem of P:
Figure FDA00038000490900000114
the optimal solution for the variable P is obtained as follows:
Figure FDA0003800049090000021
wherein
Figure FDA0003800049090000022
Figure FDA0003800049090000023
μ>0 is a positive penalty parameter, C is a lagrange multiplier, Q is an auxiliary variable and P = Q, C represents the number of rows of the matrix P;
solving the optimization problem of Q:
Figure FDA0003800049090000024
the optimal solution for the variable Q is obtained as: q = (μ P + C) (11) T +μI) -1
Solving the optimization problem of α:
Figure FDA0003800049090000025
the optimal solution for the variable α is obtained as:
Figure FDA0003800049090000026
wherein the content of the first and second substances,
Figure FDA0003800049090000027
the updated equations for C and μ are:
Figure FDA0003800049090000028
where p and μ 0 Is a constant;
and (3) clustering process: obtaining a clustering result of data by using the optimized optimal shared consistent representation matrix P, which specifically comprises the following steps: according to
Figure FDA0003800049090000029
If the ith row P :,i When the jth element value is maximum, the ith sample is divided into the jth category, and the clustering results of all samples can be obtained by solving the position corresponding to the maximum element value of each column of the characterization matrix P.
2. The incomplete multi-view clustering method based on local structure and balance perception according to claim 1,
Figure FDA00038000490900000210
is a binary matrix of 0 and 1, and is used for reserving the sum X in the matrix P (v) Corresponding sample characterization, matrix G (v) Constructing according to the view missing prior position index matrix Z, wherein the specific construction mode is as follows:
Figure FDA00038000490900000211
3. the incomplete multi-view clustering method based on local structure and balance perception according to claim 2, characterized in that incomplete multi-view data of the prior position index matrix Z is missing for a given view
Figure FDA00038000490900000212
The pretreatment is carried out, and the specific steps comprise:
deletion of missing views: deleting the missing samples in each view according to the view missing prior position index matrix Z to obtain the non-missing data set
Figure FDA0003800049090000031
Data normalization: to pair
Figure FDA0003800049090000032
Carrying out normalization pretreatment by the calculation mode of
Figure FDA0003800049090000033
Wherein
Figure FDA0003800049090000034
Representation matrix X (v) The ith column vector of (2);
local neighbor map
Figure FDA0003800049090000035
Construction: non-missing data X for each view (v) Calculating the distance between each sample and k nearest neighbor samples by using Gaussian kernel in the following calculation mode
Figure FDA0003800049090000036
Wherein
Figure FDA0003800049090000037
As a sample
Figure FDA0003800049090000038
One of k neighbors, W (v) Other non-neighbor elements are set to 0;
constructing a conversion matrix according to the view missing prior position index matrix Z
Figure FDA0003800049090000039
4. An incomplete multi-view clustering system based on local structure and balance perception, the system comprising:
the method comprises the following steps of establishing a model unit for clustering tasks of incomplete multi-view data, designing an incomplete multi-view consistent clustering characterization learning model with probability characteristics based on local structure and balanced perception, wherein the model specifically comprises the following steps:
Figure FDA00038000490900000310
s.t.U (v)T U (v) =I,1≥α v ≥0,
Figure FDA00038000490900000311
1≥P≥0,P T 1=1
wherein the content of the first and second substances,
Figure FDA00038000490900000312
base matrix, m, representing the v-th view v Representing the characteristic dimension of the v-th view, d representing the dimension of the consistent token space, P ∈ R d×n A shared consistent characterization matrix representing incomplete multiview data, n representing a total number of samples of the incomplete multiview data, α = [ α ] 1 ,...,α l ]Is a learnable weight vector, 1 ∈ R d Representing d-dimensional column vectors, alpha, with element values all of 1 v Representing the v-th element in the vector alpha, r is a positive integer no less than 2,
Figure FDA00038000490900000313
representing the element alpha in the vector alpha v Is a penalty term parameter, l represents the number of views, n v Represents the number of samples that are not missing in the v view, I is an identity matrix, I i,j A value of an element indicating an (i, j) -th row-column position of the identity matrix,
Figure FDA00038000490900000314
representing the similarity relationship between the ith sample and the vth view of the jth sample,
Figure FDA00038000490900000315
representation matrix X (v) The vector of the ith column of (a),
Figure FDA00038000490900000316
indicating that the v view does not lack the matrix set formed by the samples,
Figure FDA00038000490900000317
representation matrix G (v) The (j) th column vector of (a),
Figure FDA00038000490900000318
is a binary matrix of 0 and 1;
a data preprocessing unit for missing incomplete multi-view data of the a priori position index matrix Z for a given view
Figure FDA00038000490900000319
Carrying out pretreatment;
an optimization model unit used for characterizing a learning model according to the preprocessed data and the incomplete multi-view consistent clustering and aiming at the variables contained in the model
Figure FDA0003800049090000041
P, alpha, an introduced auxiliary variable Q, a Lagrange multiplier C and a positive penalty parameter mu, and designing a solution variable based on an alternative iterative optimization method to achieve the purpose of model optimization, wherein:
solving for U (v) The optimization problem of (2):
Figure FDA0003800049090000042
obtain the variable U (v) Is U as the optimal solution of (v) =M (v) N (v)T Wherein M is (v)(v) N (v)T Is X (v) S (v)T G (v)T P T Singular value decomposition equivalent of, S (v) =W (v) +I,
Figure FDA0003800049090000043
A pre-constructed similarity graph matrix is obtained;
solving the optimization problem of P:
Figure FDA0003800049090000044
the optimal solution for the variable P is obtained as follows:
Figure FDA0003800049090000045
wherein
Figure FDA0003800049090000046
Figure FDA0003800049090000047
μ>0 is a positive penalty parameter, C is a lagrange multiplier, Q is an auxiliary variable and P = Q, C represents the number of rows of the matrix P;
solving the optimization problem of Q:
Figure FDA0003800049090000048
the optimal solution for obtaining the variable Q is: q = (μ P + C) (11) T +μI) -1
Solving the optimization problem of α:
Figure FDA0003800049090000049
the optimal solution for the variable α is obtained as:
Figure FDA00038000490900000410
wherein the content of the first and second substances,
Figure FDA00038000490900000411
the updated equations for C and μ are:
Figure FDA00038000490900000412
where p and μ 0 Is a constant;
the clustering unit is configured to obtain a clustering result of the data by using the optimized optimal shared consistent characterization matrix P, and specifically includes: according to
Figure FDA00038000490900000413
If the ith row P :,i When the jth element value is maximum, the ith sample is divided into the jth category, and the position corresponding to the maximum element value of each column of the characterization matrix P is obtained to obtain the aggregation of all samplesAnd (4) classifying the result.
5. The incomplete multi-view clustering system based on local structure and balance perception according to claim 4,
Figure FDA00038000490900000414
is a binary matrix of 0 and 1, and is used for reserving the sum X in the matrix P (v) Corresponding sample characterization, matrix G (v) Constructing according to the view missing prior position index matrix Z, wherein the specific construction mode is as follows:
Figure FDA0003800049090000051
6. the incomplete multi-view clustering system based on local structure and balanced perception according to claim 5, wherein the data preprocessing unit comprises:
deletion of missing views: deleting the missing samples in each view according to the view missing prior position index matrix Z to obtain the non-missing data set
Figure FDA0003800049090000052
Data normalization: to pair
Figure FDA0003800049090000053
Carrying out normalization pretreatment in a calculation mode of
Figure FDA0003800049090000054
Wherein
Figure FDA0003800049090000055
Representation matrix X (v) The ith column vector of (1);
local neighbor map
Figure FDA0003800049090000056
Construction: non-missing data X for each view (v) Calculating the distance between each sample and k nearest neighbor samples by using Gaussian kernel in the following calculation mode
Figure FDA0003800049090000057
Wherein
Figure FDA0003800049090000058
As a sample
Figure FDA0003800049090000059
One of k neighbors, W (v) Other non-neighbor elements are set to 0;
constructing a conversion matrix according to the view missing prior position index matrix Z
Figure FDA00038000490900000510
7. An incomplete multi-view clustering system based on local structure and balance perception, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of partial structure and balance perception based incomplete multi-view clustering as claimed in any one of claims 1 to 3.
8. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method for imperfectly multi-view clustering based on local structure and balance perception according to any one of claims 1 to 3.
CN202210979979.6A 2022-08-16 2022-08-16 Incomplete multi-view clustering method and system based on local structure and balance perception Pending CN115311483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210979979.6A CN115311483A (en) 2022-08-16 2022-08-16 Incomplete multi-view clustering method and system based on local structure and balance perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210979979.6A CN115311483A (en) 2022-08-16 2022-08-16 Incomplete multi-view clustering method and system based on local structure and balance perception

Publications (1)

Publication Number Publication Date
CN115311483A true CN115311483A (en) 2022-11-08

Family

ID=83861978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210979979.6A Pending CN115311483A (en) 2022-08-16 2022-08-16 Incomplete multi-view clustering method and system based on local structure and balance perception

Country Status (1)

Country Link
CN (1) CN115311483A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253065A (en) * 2023-09-29 2023-12-19 哈尔滨理工大学 Incomplete multi-view scene image data clustering method based on local and global anchor graph integration
CN117523244A (en) * 2023-10-31 2024-02-06 哈尔滨工业大学(威海) Multi-view clustering method, system, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253065A (en) * 2023-09-29 2023-12-19 哈尔滨理工大学 Incomplete multi-view scene image data clustering method based on local and global anchor graph integration
CN117523244A (en) * 2023-10-31 2024-02-06 哈尔滨工业大学(威海) Multi-view clustering method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US20230023101A1 (en) Data processing method and device
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
CN115311483A (en) Incomplete multi-view clustering method and system based on local structure and balance perception
GarcíA-Pedrajas et al. A scalable approach to simultaneous evolutionary instance and feature selection
Yu et al. Fine-grained similarity fusion for multi-view spectral clustering
CN109670418B (en) Unsupervised object identification method combining multi-source feature learning and group sparsity constraint
CN111914728A (en) Hyperspectral remote sensing image semi-supervised classification method and device and storage medium
Chen et al. LABIN: Balanced min cut for large-scale data
Jin et al. A high performance implementation of spectral clustering on cpu-gpu platforms
Buvana et al. Content-based image retrieval based on hybrid feature extraction and feature selection technique pigeon inspired based optimization
CN111027636B (en) Unsupervised feature selection method and system based on multi-label learning
Xia et al. Incomplete multi-view clustering via kernelized graph learning
Liu et al. Decentralized robust subspace clustering
CN113592030B (en) Image retrieval method and system based on complex value singular spectrum analysis
Han et al. Incomplete multi-view subspace clustering based on missing-sample recovering and structural information learning
CN113516019B (en) Hyperspectral image unmixing method and device and electronic equipment
US11080551B2 (en) Proposal region filter for digital image processing
Ourabah Large scale data using K-means
Bi et al. Critical direction projection networks for few-shot learning
Salman et al. Gene expression analysis via spatial clustering and evaluation indexing
Hu et al. Multi-geometric sparse subspace clustering
EP3985529A1 (en) Labeling and data augmentation for graph data
CN115795333A (en) Incomplete multi-view clustering method based on low-rank constraint adaptive graph learning
Huang et al. Decorrelated spectral regression: An unsupervised dimension reduction method under data selection bias
CN113139556B (en) Manifold multi-view image clustering method and system based on self-adaptive composition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination