WO2022170840A1

WO2022170840A1 - Late fusion multi-view clustering machine learning method and system based on bipartite graph

Info

Publication number: WO2022170840A1
Application number: PCT/CN2021/136557
Authority: WO
Inventors: 朱信忠; 徐慧英; 梁伟轩; 赵建民
Original assignee: 浙江师范大学
Priority date: 2021-02-09
Filing date: 2021-12-08
Publication date: 2022-08-18
Also published as: ZA202207736B; CN112990265A; LU502853B1

Abstract

Disclosed is a late fusion multi-view clustering machine learning method based on a bipartite graph. The method comprises: S11, acquiring a clustering task and a target data sample; S12, performing kernel k-means clustering on each view corresponding to the acquired clustering task and target data sample, so as to obtain a basic division, and calculating diversified regular terms of each view; S13, selecting representative points of each view by using random initialization, and establishing a late fusion multi-view clustering target function based on a bipartite graph; S14, circularly solving the established late fusion multi-view clustering target function based on a bipartite graph to obtain a bipartite graph after view fusion is performed; and S15, performing spectral clustering on the obtained bipartite graph to obtain a clustering result. By means of the present application, optimized representative points can represent information of a single view, and can also better serve view fusion, such that a bipartite graph obtained by means of learning can better fuse information of all views, thereby achieving the purpose of improving a clustering effect.

Description

Machine learning method and system for late fusion multi-view clustering based on bipartite graph

technical field

The present application relates to the technical fields of computer vision and pattern recognition, and in particular, to a method and system for late fusion multi-view clustering machine learning based on bipartite graphs.

Background technique

With the development of information collection technology, for the same data sample, we can easily obtain information from different views of it. We call data with multiple views of information multi-view data. In order to cluster multi-view data, academia has derived multi-view clustering algorithms.

According to the different timing of view fusion, the existing multi-view clustering algorithms can be roughly divided into the following two categories: (1) Multi-view clustering algorithms based on previous fusion. Early fusion refers to the fusion of representations of multiple views to obtain a unified representation before clustering. Then, run the clustering algorithm on it to get the final clustering result. The more classic algorithms are multi-core clustering algorithm, multi-view spectral clustering algorithm and multi-view subspace clustering algorithm. (2) Multi-view clustering algorithm based on late fusion. Different from pre-fusion, post-fusion multi-view clustering first obtains basic divisions from each single view, and then uses these basic divisions to obtain an optimal clustering result. All ensemble clustering algorithms can be regarded as a late fusion method. For example, use the basic division to first construct the correlation matrix of each view, that is, an n×n-dimensional 0-1 matrix that judges whether the samples are classified into the same class, and learns a unified matrix through low-rank and sparse matrix decomposition. or after constructing the correlation matrix of each view, given a measure of the difficulty of sample learning, use self-paced learning to cluster the samples in an order from simple to difficult; or, maximize the linearity between the consistent partition and the basic partition The inner product between combinations; alternatively, use the late fusion method to deal with the missing multi-view clustering problem.

Although the above algorithms have achieved good results, however: (1) Most of the early-stage fusion multi-view clustering algorithms consume a lot of space and time, which makes such algorithms unable to be applied to large-scale data sets; (2) The assumption based on the existing late fusion multi-view clustering is to maximize the inner product of the linear combination of the optimal cluster indicator matrix and the basic cluster indicator matrix to obtain the optimal cluster indicator matrix, which oversimplifies the optimal clustering indicator matrix. The optimal cluster indicates the search space of the matrix.

SUMMARY OF THE INVENTION

The purpose of this application is to provide a bipartite graph-based late fusion multi-view clustering machine learning method and system for the defects of the prior art.

In order to achieve the above purpose, the application adopts the following technical solutions:

A late-fusion multi-view clustering machine learning method based on bipartite graphs, including:

S1. Obtain clustering tasks and target data samples;

S2. The basic division is obtained by running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and the diversification regular term of each view is calculated;

S3. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;

S4. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;

S5. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.

Further, in the step S2, the kernel k-means clustering is performed, specifically:

The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} ^n×k , expressed as:

in,

represents a dataset consisting of n samples;

represents the projection of sample x into a regenerated kernel Hilbert space

feature map of ;

represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B _ic =1, otherwise, B _ic =0.

Formula (1) is transformed into:

Among them, K represents the kernel matrix, and the elements of K are K _ij =φ(x _i ) ^T φ(x _j ),

Represents a vector with all elements equal to 1.

make

And convert the discrete constraints into real-valued orthogonal constraints, that is, H ^T H=I _k , then formula (2) is converted into:

Among them, I _k represents the k-dimensional identity matrix.

Further, the multi-view clustering objective function of later fusion based on the bipartite graph in the step S3 is expressed as:

stZ1 _s ＝1 _n , Z≥0, γ ^T 1 _m ＝1, γ≥0

in,

Represents the basic division of each view obtained by kernel k-means clustering;

Represents the representative point of each view;

is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for

m represents the number of views.

Further, in the step S4, the bipartite graph-based late fusion multi-view clustering objective function that is established in a cyclic manner is specifically:

Use the three-step alternation method to solve formula (3), specifically:

A1. Fixed γ and

optimize Z;

Assuming the i-th row of Z _i , it is expressed as:

in,

is the matrix

the ith row of ;

A2. Fixed γ and Z, optimized

The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A _p equal to 0

A3. Fixed

and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:

in,

Further, in the step S4, the three-step alternating method is used to solve the formula (3), wherein the three-step alternating method termination condition is expressed as:

(obj ^(t-1) -obj ^(t) )/obj ^(t) ≤ε

Among them, obj ^(t-1) and obj ^(t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.

Further, it also provides a later fusion multi-view clustering machine learning system based on bipartite graph, including:

The acquisition module is used to acquire clustering tasks and target data samples;

The operation module is used to obtain the basic division by running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and calculate the diversification regular term of each view;

Establishing a module for selecting representative points of each view by random initialization, and establishing a later fusion multi-view clustering objective function based on bipartite graph;

The solving module is used to solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain the bipartite graph after view fusion;

The clustering module is used to perform spectral clustering on the obtained bipartite graph to obtain the clustering result.

Further, the running kernel k-means clustering in the running module is specifically:

in,

represents a dataset consisting of n samples;

represents the projection of sample x into a regenerated kernel Hilbert space

feature map of ;

represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B _ic =1, otherwise, B _ic =0. Formula (1) is transformed into:

Represents a vector with all elements equal to 1.

make

Among them, I _k represents the k-dimensional identity matrix.

Further, the late fusion multi-view clustering objective function based on the bipartite graph in the establishment module is expressed as:

stZ1 _s ＝1 _n , Z≥0, γ ^T 1 _m ＝1, γ≥0

in,

Represents the representative point of each view;

m represents the number of views.

Further, the bipartite graph-based late-stage fusion multi-view clustering objective function that is established in the solving module using a cyclic method is specifically:

Using the three-step alternating method to solve formula (3), it includes:

The first fixing module for fixing γ and

optimize Z;

Assuming the i-th row of Z _i , it is expressed as:

in,

is the matrix

the ith row of ;

Second fixation module for fixing γ and Z, optimized

Third fixing module for fixing

in,

Further, the three-step alternating method is used to solve formula (3) in the solving module, and the termination condition of the three-step alternating method is expressed as:

(obj ^(t-1) -obj ^(t) )/obj ^(t) ≤ε

Compared with the prior art, the present application proposes a novel bipartite graph-based late fusion multi-view clustering machine learning method. The method includes acquiring basic clustering divisions and computing graph diversification regular terms, and optimizing objective function acquisition. Modules such as bipartite graph and clustering using bipartite graph. By optimizing the representative points, the present application enables the optimized representative points not only to represent the information of a single view, but also to better serve the view fusion, so that the learned bipartite graph can better fuse the information of each view. information to achieve the purpose of improving the clustering effect. Experimental results on six public datasets demonstrate that the present application outperforms existing methods.

Description of drawings

1 is a flowchart of a later fusion multi-view clustering machine learning method based on a bipartite graph provided by Embodiment 1;

2 is a schematic diagram of a parameter λ sensitivity map provided in Embodiment 2;

3 is a schematic diagram of the influence of different representative points s provided in Embodiment 2 on the clustering effect;

4 is a schematic diagram of changes in clustering performance and objective function values as the number of iterations increases provided by Embodiment 2;

FIG. 5 is a structural diagram of a later fusion multi-view clustering machine learning system based on a bipartite graph provided in Embodiment 3. FIG.

Detailed ways

The embodiments of the present application are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present application from the contents disclosed in this specification. The application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the application. It should be noted that the following embodiments and features in the embodiments may be combined with each other under the condition of no conflict.

Aiming at the existing defects, the present application provides a bipartite graph-based late fusion multi-view clustering machine learning method and system.

Example 1

The bipartite graph-based late fusion multi-view clustering machine learning method provided in this embodiment, as shown in Figure 1, includes:

S11. Obtain clustering tasks and target data samples;

S12. By running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, the basic division is obtained, and the diversification regular term of each view is calculated;

S13. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;

S14. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner to obtain a bipartite graph after view fusion;

S15. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.

A new method for clustering by learning multi-view information through later fusion proposed in this embodiment is used to represent the view representative point method. Compared with the anchor point that is not updated in the optimization process, the representative point can better serve It is used for multi-view clustering; and the method of using bipartite graph for graph learning in the later fusion algorithm reduces the computational and storage complexity.

In step S12, the basic division is obtained by running kernel k-means clustering on each view corresponding to the acquired clustering task and the target data sample, and the diversification regular term of each view is calculated. Specifically:

in,

represents a dataset consisting of n samples;

represents the projection of sample x into a regenerated kernel Hilbert space

feature map of ;

Formula (1) can be transformed into:

Represents a vector whose elements are all 1s; T is the convention, matrix transpose, and KBL is the matrix multiplication of K, B, and L.

Since the variable B in the above formula is discrete, optimization is difficult. make

Among them, I _k represents the k-dimensional identity matrix.

Its closed-form solution is the eigenvector corresponding to the k largest eigenvalues before the kernel matrix K, which can be obtained by eigendecomposition of K.

In step S13, the representative points of each view are selected by random initialization, and a bipartite graph-based late fusion multi-view clustering objective function is established.

Among them, the later fusion multi-view clustering objective function based on bipartite graph is expressed as:

stZ1 _s ＝1 _n , Z≥0, γ ^T 1 _m ＝1, γ≥0

in,

Represents the representative point of each view;

m represents the number of views.

In step S14, the established bipartite graph-based late fusion multi-view clustering objective function is solved in a circular manner, and a bipartite graph after view fusion is obtained, specifically:

Use the three-step alternation method to solve formula (3), specifically:

A1. Fixed γ and

optimize Z;

Assuming that the ith row _zi of Z can be optimized row by row, that is, an optimization problem on simplex, it can be expressed as:

in,

is the matrix

the ith row of ;

A2. Fixed γ and Z, optimized

The closed-form solution can be obtained by setting the partial derivative of the objective function with respect to A _p equal to 0

A3. Fixed

in,

The termination condition of the above three-step alternation method is expressed as:

(obj ^(t-1) -obj ^(t) )/obj ^(t) ≤ε

In step S15, spectral clustering is performed on the obtained bipartite graph to obtain a clustering result.

The process of spectral clustering for the bipartite graph Z is as follows:

make

where Λ=diag(Z ^T 1 _n ). right

Perform eigenvalue decomposition, and set the diagonal matrix composed of the top k largest eigenvalues and the corresponding eigenvectors to be Σ _k and V _{k respectively} . make

The final clustering result can be obtained by performing standard k-means clustering on F by row.

Compared with the prior art, this embodiment proposes a novel bipartite graph-based late fusion multi-view clustering machine learning method. The method includes acquiring basic clustering division and computing graph diversification regular terms, optimizing the objective function. Modules for obtaining bipartite graphs and clustering using bipartite graphs. By optimizing the representative points, in this embodiment, the optimized representative points can not only represent the information of a single view, but also better serve the view fusion, so that the learned bipartite graph can better fuse each view information to achieve the purpose of improving the clustering effect.

Embodiment 2

This embodiment provides a bipartite graph-based late fusion multi-view clustering machine learning method and the difference between the first embodiment is:

In this example, the clustering performance of the proposed method was tested on 6 MKL standard datasets, including Oxford Flower17, Oxford Flower102, Protein fold prediction, UCI-Digital, Columbia Consumer Video (CCV) and Caltech102. See Table 1 for information about the dataset.

Table 1

DatasetDataset	SamplesSamples	KernelsKernels	ClustersClusters
Flower17Flower17	13601360	77	1717
Flower102Flower102	81898189	44	102102
ProteinFoldProteinFold	694694	1212	2727
DigitDigit	20002000	33	1010
CCVCCV	67736773	33	2020
Caltech102Caltech102	15301530	2525	102102

For ProteinFold, this example generates 12 benchmark kernel matrices, of which the first 10 feature sets use second-order polynomial kernels, and the last two use cosine inner product kernels. For CCV, three base kernels are generated by applying a Gaussian kernel on the SIFT, STIP and MFCC features, and the width of the three Gaussian kernels is set as the mean of the distances of each pair of samples. Kernel matrices for other datasets can be downloaded from the Internet.

In this experiment, the average multi-kernel clustering algorithm (A-MKKM), the optimal single-view kernel k-means clustering algorithm (SB-MKKM), the multi-kernel k-means clustering (MKKM), the robust multi-kernel clustering (RMKKM), the Multi-kernel k-means clustering with matrix-induced regularization term (MKKM-MR), optimal neighbor multi-kernel clustering (ONKC), late fusion-based maximally aligned multi-view clustering (MVC-LFA). In all experiments, all benchmark kernels are first centered and regularized. For all datasets, the number of classes is assumed to be known and set to the number of cluster classes. In addition, this experiment uses the grid search parameters of RMKKM, MKKM-MR, ONKC and MVC-LFA. The regularization parameter of the method in this embodiment is also determined by grid searching in the range of [ ^2-15,2-12 ,..., ^{2 15} ^] , the number of representative points is s=8k, and k is the number of clusters.

This experiment uses Common Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Purity (Purity) to show the clustering performance of each method. All methods are randomly initialized and repeated 50 times and show the best results to reduce randomness caused by k-means.

Table 2

Table 2 shows the clustering effects of the above methods and the comparison algorithms on all datasets. According to the table, it can be observed that: 1. The proposed algorithm outperforms all comparison algorithms under the three evaluation criteria. 2. ONKC is an important benchmark algorithm in multi-core algorithms, and the performance of the proposed algorithm on the six datasets ACC is 7.14%, 10.22%, 3.17%, 3.45%, 6.07% better than ONKC, respectively and 10.2%. 3.MVC-LFA is a late fusion algorithm, which usually performs better than most other multi-view algorithms, and the proposed algorithm exceeds its average by 7.58%, 7.07% and 7.34% under the three clustering indicators, respectively. .

In addition, we also compared the performance of anchor points that were not updated during the optimization process, that is, using k-means clustering and random sampling to select anchor points, substitute them into the target formula, and not update them during the running of the algorithm. To avoid the influence of randomness of the algorithm, we repeated the experiment 50 times and averaged all the results. The results are shown in Table 3.

table 3

It can be seen from Table 3 that the effect of representative points selected by k-means or randomly selected is much worse than that of the representative point method proposed by us. Therefore, the update of our representative points during the algorithm optimization process is effective.

This embodiment introduces a regularization parameter λ to balance the weight of bipartite graph learning and diversification of regular terms. As shown in Fig. ² , the variation of NMI is plotted when λ varies in the range of [ ^2-15,2-12 ,..., ²¹⁵ ], taking the best comparison algorithm on this dataset as the basic reference. From this figure, it can be seen that: 1) the best NMI is always obtained when the two terms are properly balanced; 2) the proposed algorithm outperforms the best contrasting algorithm on most datasets regardless of the variation of λ.

There is another important parameter in this embodiment, that is, the number s of representative points. We select the number of representative points in the range of [2k,4k,...,14k], where k is the number of clusters, and conduct experiments. The results are shown in Figure 3. It can be seen that with the increase of s, the clustering effect shows an overall upward trend. However, a larger s will inevitably bring higher computational overhead. In order to take into account the clustering effect and complexity, the number of representative points s=8k can be selected empirically.

This embodiment also gives the objective function value and changes in clustering performance at each iteration, as shown in FIG. 4 . It can be seen that the objective function value decreases monotonically and usually converges within 25 iterations. It can be seen that with the decrease of the objective function, the clustering effect will fluctuate, but the overall trend is upward. This example shows that the algorithm can continuously improve the clustering performance during the training process.

Embodiment 3

This embodiment provides a later fusion multi-view clustering machine learning system based on bipartite graph, as shown in Figure 5, including:

an acquisition module 11, for acquiring clustering tasks and target data samples;

The operation module 12 is used for running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the basic division, and calculate the diversification regular term of each view;

The establishment module 13 is used to select the representative points of each view by random initialization, and establish the later fusion multi-view clustering objective function based on the bipartite graph;

The solving module 14 is used to solve the established bipartite graph-based later fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;

The clustering module 15 is configured to perform spectral clustering on the obtained bipartite graph to obtain a clustering result.

in,

represents a dataset consisting of n samples;

represents the projection of sample x into a regenerated kernel Hilbert space

feature map of ;

Represents a vector with all elements equal to 1.

make

Among them, I _k represents the k-dimensional identity matrix.

stZ1 _s ＝1 _n , Z≥0, γ ^T 1 _m ＝1, γ≥0

in,

Represents the representative point of each view;

m represents the number of views.

Using the three-step alternating method to solve formula (3), it includes:

The first fixing module for fixing γ and

optimize Z;

Assuming the i-th row of Z _i , it is expressed as:

in,

is the matrix

the ith row of ;

Second fixation module for fixing γ and Z, optimized

Third fixing module for fixing

in,

(obj ^(t-1) -obj ^(t) )/obj ^(t) ≤ε

It should be noted that the bipartite graph-based late fusion multi-view clustering machine learning system provided in this embodiment is similar to that of the first embodiment, and details are not repeated here.

Compared with the prior art, this embodiment includes modules such as acquiring basic clustering division and computing graph diversification regular terms, optimizing objective function to acquire bipartite graph, and using bipartite graph for clustering. By optimizing the representative points, in this embodiment, the optimized representative points can not only represent the information of a single view, but also better serve the view fusion, so that the learned bipartite graph can better fuse each view information to achieve the purpose of improving the clustering effect.

Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims

The bipartite graph-based late fusion multi-view clustering machine learning method is characterized in that it includes:

S1. Obtain clustering tasks and target data samples;

S2. The basic division is obtained by running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and the diversification regular term of each view is calculated;

S3. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;

S4. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;

S5. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
The bipartite graph-based late-stage fusion multi-view clustering machine learning method according to claim 1, wherein in the step S2, the kernel k-means clustering is performed, specifically:

The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:

in,
represents a dataset consisting of n samples;
represents the projection of sample x into a regenerated kernel Hilbert space
feature map of ;
Indicates the number of samples belonging to the c-th cluster, 1≤c≤k; i indicates the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0;

Formula (1) is transformed into:

Among them, K represents the kernel matrix, and the elements of K are

represents a vector whose elements are all 1;

make
And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:

Among them, I k represents the k-dimensional identity matrix.
The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 2, wherein in the step S3, the bipartite graph-based late fusion multi-view clustering objective function is expressed as:

stZ1 s ＝1 n , Z≥0, γ T 1 m ＝1, γ≥0

in,
Represents the basic division of each view obtained by kernel k-means clustering;
Represents the representative point of each view;
is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
m represents the number of views.
The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 3, wherein in the step S4, the bipartite graph-based late fusion multi-view clustering objective function is solved and established in a circular manner Specifically:

Use the three-step alternation method to solve formula (3), specifically:

A1. Fixed γ and
optimize Z;

Assuming the i-th row of Z i , it is expressed as:

in,
is the matrix
the ith row of ;

A2. Fixed γ and Z, optimized
The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0

A3. Fixed
and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:

in,
The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 4, characterized in that in step S4, a three-step alternation method is used to solve formula (3), wherein the termination condition of the three-step alternation method represents the for:

(obj (t-1) -obj (t) )/obj (t) ≤ε

Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
The later fusion multi-view clustering machine learning system based on bipartite graph is characterized in that it includes:

The acquisition module is used to acquire clustering tasks and target data samples;

The operation module is used to obtain the basic division by running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and calculate the diversification regular term of each view;

establishing a module for selecting representative points of each view by random initialization, and establishing a later fusion multi-view clustering objective function based on bipartite graph;

The solving module is used to solve the established bipartite graph-based later fusion multi-view clustering objective function in a cyclic manner, and obtain the bipartite graph after view fusion;

The clustering module is used to perform spectral clustering on the obtained bipartite graph to obtain the clustering result.
The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 6, characterized in that, in the operation module, the kernel k-means clustering is performed, specifically:

The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:

in,
represents a dataset consisting of n samples;
represents the projection of sample x into a regenerated kernel Hilbert space
feature map of ;
Indicates the number of samples belonging to the c-th cluster, 1≤c≤k; i indicates the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0;

Formula (1) is transformed into:

Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ),

represents a vector whose elements are all 1;

make
And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:

Among them, I k represents the k-dimensional identity matrix.
The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 7, wherein the bipartite graph-based late fusion multi-view clustering objective function in the establishment module is expressed as:

stZ1 s ＝1 n , Z≥0, γ T 1 m ＝1, γ≥0

in,
Represents the basic division of each view obtained by kernel k-means clustering;
Represents the representative point of each view;
is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
m represents the number of views.
The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 8, wherein the solving module adopts a circular way to solve the established bipartite graph-based late fusion multi-view clustering objective function Specifically:

Using the three-step alternating method to solve formula (3), it includes:

The first fixing module for fixing γ and
optimize Z;

Assuming the i-th row of Z i , it is expressed as:

in,
is the matrix
the ith row of ;

Second fixation module for fixing γ and Z, optimized
The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0

Third fixing module for fixing
and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:

in,
The bipartite graph-based late-stage fusion multi-view clustering machine learning system according to claim, characterized in that, in the solution module, a three-step alternation method is used to solve formula (3), wherein the three-step alternation method termination condition represents for:

(obj (t-1) -obj (t) )/obj (t) ≤ε

Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.