LU502853B1 - Bipartite graphs based post-fusion multi-view clustering machine learning methods and systems - Google Patents

Bipartite graphs based post-fusion multi-view clustering machine learning methods and systems Download PDF

Info

Publication number
LU502853B1
LU502853B1 LU502853A LU502853A LU502853B1 LU 502853 B1 LU502853 B1 LU 502853B1 LU 502853 A LU502853 A LU 502853A LU 502853 A LU502853 A LU 502853A LU 502853 B1 LU502853 B1 LU 502853B1
Authority
LU
Luxembourg
Prior art keywords
view
clustering
denotes
fusion
bipartite graph
Prior art date
Application number
LU502853A
Other languages
German (de)
Inventor
Xinzhong Zhu
Jianmin Zhao
Huiying Xu
Weixuan Liang
Original Assignee
Univ Zhejiang Normal
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Zhejiang Normal filed Critical Univ Zhejiang Normal
Application granted granted Critical
Publication of LU502853B1 publication Critical patent/LU502853B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Discrete Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Catalysts (AREA)
  • Machine Translation (AREA)

Abstract

The present application discloses a post-fusion multi-view clustering machine learning method based on bipartite graphs, comprising: S11. Acquiring clustering task and target data samples; S12. Running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the base partition and calculate the diversity regularity term for each view; S13. Establishing a post-fusion multi-view clustering objective function based on bipartite graphs by selecting representative points of each view using random initialization; S14. Solving the established post-fusion multi-view clustering objective function based on the bipartite graph using a circular approach to obtain the bipartite graph after view fusion; and S15. The obtained bipartite maps are spectrally clustered to obtain the clustering results. This application makes the optimized representative points not only represent the information of individual views, but also better serve the view fusion, which in turn enables the learned bipartite graph to better fuse the information of individual views and achieve the purpose of improved clustering effect.

Description

BL-5569
LU502853
BIPARTITE GRAPHS BASED POST-FUSION MULTI-VIEW CLUSTERING
MACHINE LEARNING METHODS AND SYSTEMS
Technical Field
The present application relates to the field of computer vision and pattern recognition technology, and in particular to a post-fusion multi-view clustering machine learning method and system based on a bipartite graph.
Background Art
With the development of information collection technology, for the same data sample, we can easily get the information of its different views. We call data with multiple view information as multi-view data. In order to cluster multi-view data, multi-view clustering algorithms have been derived in academia.
According to the different timing of view fusion, the existing multi-view clustering algorithms can be broadly classified into the following two categories: (1) Multi-view clustering algorithms based on pre-fusion. Pre-fusion, which refers to the fusion of representations from multiple views to obtain a unified representation before clustering is performed. Then, the clustering algorithm 1s run on them to obtain the final clustering results. Some of the more classical algorithms are multi- core clustering algorithm, multi-view atlas clustering algorithm and multi-view subspace clustering algorithm. (2) Multi-view clustering algorithm based on post-fusion. Unlike the pre-fusion, the post-fusion multi-view clustering first obtains the base divisions from each single view and then uses one of these base divisions to obtain an optimal clustering result. All the integrated clustering algorithms can be regarded as a late fusion method. For example, using basis partitioning to first construct the association matrix of each view, i.e., an n x n dimensional 0-1 matrix that determines whether the samples are classified as the same class between two, from which a uniform representation is learned by means of low-rank and sparse matrix decomposition; or, after constructing the association matrix of each view, given a measure of the learning difficulty of the samples, use self-step learning to evaluate the samples in clustering the samples in order of simplicity to difficulty; or, maximizing the inner product between the linear combination of 1
BL-5569
LU502853 consistent partitioning and base partitioning; or, using a post-fusion approach for the missing multi- view clustering problem.
Although the above algorithms achieve good results, however: (1) The vast majority of the pre-fusion multi-view clustering algorithms are very space- and time-consuming, resulting in the inability of such algorithms to be applied on large-scale datasets; (2) The existing post-fusion multi- view clustering is based on the assumption of maximizing the inner product of the linear combination ofthe optimal clustering indicator matrix and the base clustering indicator matrix, and using the quest optimal clustering indicator matrix, which oversimplifies the search space of the optimal clustering indicator matrix.
Summary of the Invention
It is an object of the present application to provide, in response to the defects of the prior art, a post-fusion multi-view clustering machine learning method and system based on bipartite graphs.
To achieve the above, this application uses the following technical solution:
À post-fusion multi-view clustering machine learning method based on a bipartite graph, comprising:
S1. Acquiring clustering task and target data samples;
S2. Run the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the base partition and calculate the diversity regularity term for each view;
S3. Post-fusion multi-view clustering objective function based on bipartite graphs; using random initialization to select representative points for each view
S4. Solving the established post-fusion multi-view clustering objective function based on the bipartite graph using a circular approach to obtain the bipartite graph after view fusion;
SS. Spectral clustering is performed on the obtained bipartite graph to obtain the clustering results.
Further, said step S2 runs kernel k-mean clustering, expressed as:
The objective of the kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B € {0,1}™X, denoted as: 2
BL-5569 fi, TI Billo) — ols. Tey Be = 10)
Where, {x;}}L, = X denotes a data set consisting of n samples; ¢(-):x € X —> H denotes a feature mapping that projects sample x to a regenerative kernel Hilbert space H; uy. = (1/0) HE, Bie d(x;), Ne = XL; Bic denotes the number of samples belonging to the c-th cluster number of samples, 1 < c < k; 1 denotes the sample number; when the i-th sample belongs to the c-th cluster, B;. = 1, otherwise, Bic = 0.
Equation (1) is reduced to: min Tr(K) — Tr (LzBTKBL?) s.t.B1x = 1n(2)
Ze(0,1}1*k
Where, K denotes the kernel matrix, the elements of K are Kj = dx) P(x), L= diag([n7%, ..., ng*]), and 1, € RS denotes the vector with all elements being 1.
Let H = BL: and convert the discrete constraint to a real-valued orthogonal constraint, i.e.,
HTH = I, then equation (2) is converted to: „min Tr(K(l, — HH"))
Where, I, denotes a k-dimensional unit matrix.
Further, the post-fusion multi-view clustering objective function based on the bipartite graph in step S3 is expressed as: yA Zp VEIlZ — Hp AD; +Ay"My 3) s.t.Z1, = 1, Z > 0, y"1m = Ly>0
Where,H, € R"*K(p € {1,2,...,m}) denotes the base partition of each view obtained from kernel k-means clustering; Ap € RS*K(p € {1,2,..., m}) denotes the representative points of each view; Z € R™ Sis the bipartite view fusion graph; n,k,s denote the number of samples, the number of clusters and the number of representative points, respectively; A denotes the regularization parameter; y denotes the combination coefficient of each view, M denotes the view diversification regularity term with the element Mpq = tr(Hp Hy); and m denotes the number of views.
Further, the step S4 uses a cyclic approach to solve the established post-fusion multi-view 3
BL-5569
LU502853 clustering objective function based on the bipartite graph, expressed as:
Solving equation (3) using the three-step alternating method, expressed as:
Al. fixing y and {Ap}p=1 and optimizing Z.
Let the i-th row of Z be z;, which is denoted as: min||z; —ql%, stz;=01zl1,=1 i
Where, c; = XpL, Y zZ [ZB Y ; zP) is the i-th row of the matrix HpA};
A2. fixing y and Z, optimizing {Ap}p=1, using making the objective function bias derivative about A, equal to 0 and obtaining the closed-form solution A, = Z"H,(HyH,)™*;
A3. Fixing {Ap}p=1 and Z, optimizing y, the objective function is transformed into a quadratic programming problem with linear constraints, denoted as: min y (AM+T) 7, sty 1,=17 >0
Where, T = diag (||Z — H,AT |}, .. IZ — HAL IZ).
Further, the step S4 utilizes the three-step alternating method to solve equation (3), where the three-step alternating method termination condition is expressed as: (objt=1 — obj‘®) /obj® < €
Where, obj‘t-",obj‘? denote the values of equation (3) for the t-th and t — 1-th iteration respectively, and ¢ denotes the set accuracy.
Further, a bipartite graph-based post-fusion multi-view clustering machine learning system is provided, including:
Acquisition module for acquiring a clustering task and target data samples;
Run module for running kernel k-means clustering by running kernel k-means clustering on each view corresponding to the acquired clustering task and the target data samples to obtain the base division and calculate the diverse regular terms for each view;
Build module for building a post-fusion multi-view clustering objective function based on a bipartite graph using random initialization to select representative points for each view;
Solving module for solving the established post-fusion multi-view clustering objective function based on the bipartite graph using a round-robin approach to obtain the bipartite graph after view fusion; 4
BL-5569
LU502853
Clustering module for spectral clustering of the obtained bipartite graphs and obtaining clustering results.
Further, the run module runs kernel k-mean clustering, to be specific:
The objective of the kernel k-means clustering 1s to minimize the sum of squared errors based on the partition matrix B € {0,1}™X, denoted as: ef, TE en Billo (x) — Hell2 SAXE 1 Bie = 41)
Where, {x;}{L, = X denotes the data set consisting of n samples; d():x EX > H denotes the feature mapping that projects the sample x to a regenerative kernel Hilbert space H;
Ue = (=) Yim Bic P (3), ne = ET, Bic denotes the number of samples belonging to the c-th cluster number of samples, 1 < c < k; 1 denotes the sample serial number; when the 1-th sample belongs to the c-th cluster, Bir = 1, otherwise, Bic = 0.
Equation (1) is reduced to: min Tr(K) — Tr (L2BTKBL?) s.t.B1x = 1n(2)
Ze(0,1}1*k
Where, K denotes the kernel matrix, the elements of K are Kj = dx) P(x), L= diag([n7%, ..., ng*]), and 1, € RS denotes the vector with all elements being 1.
Let H = BL: and convert the discrete constraint to a real-valued orthogonal constraint, i.e.,
HTH = I, then equation (2) is converted to: „min Tr(K(l, — HH"))
Where, I, denotes a k-dimensional unit matrix.
Further, the post-fusion multi-view clustering objective function based on the bipartite graph in the establishment module, denoted as: yA Zp VEIlZ — Hp AD; +Ay"My 3) s.t.Z1, = 1, Z > 0, y"1m = Ly>0
Where,H, € R"*K(p € {1,2,...,m}) denotes the base partition of each view obtained from kernel k-means clustering; A, € R°*k(p € {1,2,...,m}) denotes the representative points of each view; Z € R"*S is the bipartite view fusion graph; n,k,s denote the number of samples, the
BL-5569
LU502853 number of clusters and the number of representative points, respectively; À denotes the regularization parameter; y denotes the combination coefficient of each view, M denotes the view diversity regularization term with the element Mpq = tr(Hp Hy); and m denotes the number of views.
Further, the solving module uses a cyclic approach to solve the established post-fusion multi- view clustering objective function based on the bipartite graph, to be specific:
Solving equation (3) using a three-step alternating method, including:
The first fixation module for fixing y and {Ap}p=1, optimizing Z.
Let the i-th row of Z be z;, which is denoted as: min||z; —ql%, stz;=01zl1,=1
Where, ¢; = EL, v 2,0) /Eper Ÿ 8 z®) isthe i-th row of the matrix HyAD: > Ci p=1 Ÿ Zi p=1 Ÿ 5° Zi pip;
The second fixing module for fixing y and Z, optimizing {Ap}p=1, and using such that the objective function is biased with respect to Ap equal to 0 to obtain the closed-form solution A, =
Z Hp (Hp Hp) 7;
The third stationary module, used to fix {Ap}p=1 and Z, optimizes y and transforms the objective function into a quadratic programming problem with linear constraints, denoted as: min y (AM+T) 7, sty 1,=17 >0
Where, T = diag (||Z — H,AÏ||> — IZ — HAL I).
Further, the solving module utilizes a three-step alternating method to solve equation (3), wherein the three-step alternating method termination condition is expressed as: (objt=D — obj‘) /obj() < €
Where, obj-"),obj‘® denote the values of Eq. (3) for the t-th and t — 1-th iteration respectively, and € denotes the setting accuracy.
Compared with the prior art, this application proposes a novel post-fusion multi-view clustering machine learning method based on bipartite graphs, which includes the modules of obtaining the base clustering partition and computing graph diversity regular terms, optimizing the objective function to obtain bipartite graphs, and using bipartite graphs for clustering. By optimizing the representative points, the application makes the optimized representative points not 6
BL-5569
LU502853 only represent the information of individual views, but also better serve the view fusion, so that the learned bipartite graph can better fuse the information of each view and achieve the purpose of improving the clustering effect. Experimental results on six public datasets demonstrate that the performance of this application outperforms existing methods.
Illustration of the attached figure
Figure 1 is a flowchart of a post-fusion multi-view clustering machine learning method based on bipartite graphs provided in Embodiment I;
Figure 2 1s a schematic diagram of the parameter À sensitivity map provided in Embodiment
II;
Figure 3 is a schematic diagram of the effect of different representative point s on the clustering effect provided in Embodiment 2;
Figure 4 is a schematic diagram of the variation of clustering performance and objective function values with increasing number of iterations provided by Embodiment II;
Figure 5 is a diagram of the structure of the bipartite graph-based late fusion multi-view clustering machine learning system provided in Embodiment III.
Detailed Description of Embodiments
The following illustrates the steps of this application by specific concrete examples, and other advantages and efficacy of this application can be readily understood by a person of ordinary skill in the art by what is disclosed in this specification. The present application may also be implemented or applied by additionally different specific embodiments, and the details in this specification may also be modified or changed in various ways without departing from the spirit of the present application based on different views and applications. It is to be noted that the following embodiments and the features in the embodiments can be combined with each other without conflict.
The present application provides a late fusion multi-view clustering machine learning method and system based on bipartite graphs for existing defects.
Embodiment I 7
BL-5569
LU502853
The present embodiment provides a bipartite graph-based post-fusion multi-view clustering machine learning method, as shown in Figure 1, comprising:
S11. Acquisition of clustering task and target data samples;
S12. by running kernel k-means clustering on each view corresponding to the acquired clustering task and the target data sample to obtain the base division and calculate the diversified canonical term; for each view
S13.Using random initialization to select representative points of each view to build a post- fusion multi-view clustering objective function; based on the bipartite graph
S14. Solve the established post-fusion multi-view clustering objective function based on bipartite graph using a round-robin approach to obtain the bipartite graph after view fusion;
S15. Spectral clustering 1s performed on the obtained bipartite map to obtain the clustering results.
The present embodiment proposes a new method for learning multi-view information for clustering by post-fusion for representing view representative point method, which can better serve multi-view clustering; and reduce computational and storage complexity by using bipartite graphs for graph learning in the post-fusion algorithm compared to anchor points that are not updated during the optimization process.
In step S12, the base division 1s obtained by running kernel k-mean clustering on each view corresponding to the acquired clustering task and the target data sample, and calculating the diversified canonical terms for each view. Specifically:
The objective of kernel k-means clustering 1s to minimize the sum of squared errors based on the partition matrix B € {0,1}"*X, denoted as: ef, TE en Billo (x) — Hell2 SAXE 1 Bie = 41)
Where, {x;}L, € X denotes a data set consisting of n samples; b(-):x € X + H denotes a feature mapping that projects sample x to a regenerative kernel Hilbert space H ; y, = (1/0) HE, Bie®(xi), Ne = XL; Bic denotes the number of samples belonging to the c-th cluster
The number of samples, 1 < c < k, 1 denotes the sample serial number; when the i-th sample belongs to the c-th cluster, Bic = 1, otherwise, Bi. = 0.
Equation (1) can be reduced to: 8
BL-5569
LU502853 1 1 „min „Tr(K) — Tr (LBTKBL:) s.t.B1x = 1n(2)
Where, K denotes the kernel matrix, the elements ofK are Kij = dx) TP (xj) , L= diag([n] 1. ng 1), 1, € R® denotes the vector with all elements 1; T is conventionally a matrix transpose, and KBL is the matrix multiplication of K, B and L.
Since the variable B in the above equation is discrete, optimization is more difficult. Let H =
BL: and convert the discrete constraint to a real-valued orthogonal constraint, i.e, HTH = I, then equation (2) is converted to: „min Tr(K(l, — HH"))
Where, I; denotes the k-dimensional unit matrix.
Its closed-form solution is the eigenvector corresponding to the largest eigenvalue of the first k of the kernel matrix K, which can be obtained by performing eigendecomposition of K.
In step S13, representative points of each view are selected using random initialization to establish a post-fusion multi-view clustering objective function based on the bipartite graph.
Where the post-fusion multi-view clustering objective function based on the bipartite graph, denoted as: yA Zp VEIlZ — Hp AD; +Ay"My 3) s.t.Z1, = 1, Z > 0, y"1m = Ly>0
Where, H, € R"<K(p € {1,2,...,m}) denotes the base partition of each view obtained from the kernel k-means clustering, A, € RS*K(p € {1,2, …, m})denotes the representative points of each view; Z € R"*S is the bipartite graph after view fusion; n,k,s denote the number of samples, clusters and representative points, respectively; À denotes the regularization parameter; ydenotes the combination coefficient of each view, M denotes the view diversity regularization term with the element Mpq = tr(Hp Hy); m denotes the number of views.
In step S14, the established post-fusion multi-view clustering objective function based on the bipartite graph is solved in a cyclic manner to obtain the bipartite graph after view fusion, as follows:
Equation (3) is solved using the three-step alternating method, as follows: 9
BL-5569
A1.Fixingy and {Ap}p=1, optimizingZ; 10006008
Let the i-th row of Z be z;, which can be optimized row-by-row, i.e., an optimization problem over a simplex, then denoted as: min||z; —ql%, stz;=01zl1,=1 i
Where,c; = X01; Y zZ [ZB Y ; zP) is the i-th row of the matrix HpA};
A2 Fixing y and Z and optimizing {Ap}}}1, a closed-form solution A, = Z"Hp(HpHp)™ can be obtained by making the objective function bias derivative with respect to Ap equal to 0;
A3. Fixing {Ap}p=1 and Z, optimizing y, the objective function is transformed into a quadratic programming problem with linear constraints, denoted as: min y (AM+T) 7, sty 1,=17 >0
Where, T = diag (||Z — HAL, — IZ — HAL I).
The above three-step alternating method termination condition is expressed as: (objt=1 — obj‘®) /obj® < €
Where, objt-" , obj® denote the values of Eq. (3) for the tth and t-Ith iteration, respectively, and € denotes the setting accuracy.
In step S15, spectral clustering is performed on the obtained bipartite graph to obtain the clustering results.
The process of spectral clustering of the bipartite graph Z is specified as:
Let Z= ZA? where A = diag(Z"1,). The eigenvalue decomposition of Z'Z is performed by setting the diagonal matrix composed of its first k largest eigenvalues and the corresponding eigenvectors as X, and Vprespectively. Let F = IVI, 1/2 € Rnxk and the final clustering result is obtained by standard k-mean clustering of F by rows.
Compared with the prior art, the present embodiment proposes a novel post-fusion multi-view clustering machine learning method based on bipartite graphs, which includes the modules of obtaining the base clustering division and computing the graph diversity regular terms, optimizing the objective function to obtain the bipartite graph, and using the bipartite graph for clustering. By optimizing the representative points, the present embodiment makes the optimized representative points not only represent the information of individual views, but also better serve the view fusion,
BL-5569
LU502853 which in turn enables the learned bipartite graph to better fuse the information of individual views and achieve the purpose of improved clustering effect.
Embodiment II
The present embodiment provides a bipartite graph-based post-fusion multi-view clustering machine learning method that differs from embodiment one in that:
The present embodiment tested the clustering performance of the method of this application on six MKL standard datasets, including Oxford Flower17, Oxford Flower102, Protein fold prediction, UCI-Digital, Columbia Consumer Video (CCV), and Caltech102. See Table 1 for information about the datasets.
Table 1
For ProteinFold, this case generates a matrix of 12 basis kernels, of which the first 10 feature sets use second-order polynomial kernels and the last two use cosine inner product kernels. For
CCV, three base kernels are generated by applying a Gaussian kernel over the SIFT, STIP and
MFCC features, with the width of the three Gaussian kernels set to the mean value of each pair of sample distances. Kernel matrices for other datasets can be downloaded from the Internet.
This experiment uses the average multinuclear clustering algorithm (A-MKKM), optimal single-view kernel kernel k-mean clustering algorithm (SB-MKKM), multinuclear k-mean clustering (MKKM), robust multinuclear clustering (RMKKM), multinuclear k-mean clustering with matrix-induced regularization terms (MKKMMR), optimal neighbor multinuclear clustering (ONKC), and post-fusion-based maximally aligned multi-view clustering (MVC-LFA). In all experiments, all benchmark kernels were first centralized and regularized. For all datasets, it is 11
BL-5569
LU502853 assumed that the number of categories is known and is set to the number of clustering categories.
In addition, this experiment uses the parameters of grid search RMKKM, MKKMMR, ONKC and
MVC-LFA. The regularization parameters of this example method are also determined by the range of grid search [271°, 2712, „.., 2151 with the representative number of points taken as s = 8k and k is the number of clustering clusters.
Common clustering accuracy (ACC), normalized mutual information (NMI) and purity (Purity) are used in this experiment to show the clustering performance of each method. All methods were randomly initialized and repeated 50 times and the best results were shown to reduce the randomness caused by k-means.
Table 2
A- SB- MKK | RMKK | MKKM- MVC- Propos
Dataset ONKC
MKKM KKM M M MR LFA ed
ACC(%)
Flowerl 51.03 42.06 45.37 | 49.93 36.47 59.85 58.82 66.99 7
Flowerl 27.29 41.99 21.96 | 28.06 40.05 40.41 43.73 50.63 02
ProteinF 28.10 33.86 27.23 33.43 36.31 37.46 33.14 40.63 old 88.75 75.65 47.00 | 42.40 87.45 91.15 88.90 94.60 19.98 33.86 18.29 16.27 22.74 23.08 26.13 29.15
Caltech 27.91 35.29 16.31 21.05 23.40 30.20 32.16 46.40 102
NMI(%)
Flowerl 50.19 45.14 4535 | 49.69 38.64 57.45 58.28 65.43 7
Flowerl 46.32 49.08 42.30 | 47.92 56.12 56.55 58.04 64.51 02 12
BL-5569 0502853
ProteinF 38.53 42.03 37.16 | 39.12 44.00 46.21 43.16 47.57 old 80.59 68.44 48.16 | 47.55 79.51 84.09 80.97 88.62 17.06 42.03 15.04 11.55 18.60 18.99 19.81 23.55
Caltech 49.31 54.15 39.92 | 42.96 46.13 51.02 53.22 66.26 102
Purity(%)
Flowerl 51.99 44.63 46.84 | 51.03 39.34 61.03 60.51 67.65 7
Flowerl 32.28 44.56 27.61 33.87 45.77 46.40 49.47 56.27 02
ProteinF 36.17 41.21 33.86 | 37.18 42.36 45.68 40.35 46.54 old 88.75 76.30 49.70 | 45.55 87.45 91.15 88.90 94.60 23.56 41.21 22.41 20.15 24.76 26.27 28.24 32.41
Caltech 29.71 37.35 18.04 | 22.97 25.46 31.60 34.18 48.24 102
Table 2 shows the clustering results of the above methods as well as the comparison algorithms on all data sets. According to the table, it can be observed that: 1. the proposed algorithm outperforms all the compared algorithms under all three evaluation criteria. 2. ONKC is an important benchmark algorithm in MVC-LFA, and the proposed algorithm outperforms ONKC by up to 7.14%, 10.22%, 3.17%, 3.45%, 6.07% and 10.2% on the six datasets ACC, respectively. 3.
MVC-LFA is a late fusion algorithm that usually performs better than most other multi-view algorithms, and the proposed algorithm outperforms them by 7.58%, 7.07% and 7.34% on average for the three clustering metrics, respectively.
In addition, we compare the performance of anchors that are not updated during the optimization process, i.e., selected anchors using k-mean clustering and random sampling, respectively, substituted into the objective equation, and not updated during the algorithm run. To 13
BL-5569
LU502853 avoid the effect of algorithm randomness, we repeated this experiment 50 times and took the average of all results. The results are shown in Table 3.
Table 3 sl EE | EE 42.65 24.86 26.65 80.35 | 20.60 30.07 mean)
Anchor points (random 58.61 37.53 36.71 90.00 | 22.49 44.48 sampling)
Representative points (iterative 66.99 50.63 40.63 94.60 | 29.15 46.40 update) em | | nf on 40.24 43.76 34.71 75.28 | 15.71 55.93 mean)
Anchor points (random 57.12 54.76 46.07 83.89 | 18.34 65.29 sampling)
Representative points (iterative 65.43 64.51 47.57 88.62 | 23.55 66.26 update) sl EE EN CR EE 43.60 30.64 33.29 80.35 | 23.95 31.37 mean)
Anchor points (random 60.28 45.24 43.29 90.41 | 26.15 46.44 sampling) 14
BL-5569
LU502853
Representative points (iterative 67.65 56.27 46.54 94.60 | 32.41 48.24 update)
As can be seen from Table 3, the effect of representative points, either selected by k-means or randomly selected, is much worse than our proposed representative point method. Therefore, we represent the point in the algorithm optimization process of the update 1s effective.
This case introduces the regularization parameter À to balance the weight of bipartite graph learning and diverse regularization terms. As shown in Fig. 2, the variation of NMI when À is varied in the range of [27!°, 2712, ., 21°] is plotted, using the comparison algorithm that works best on this dataset as the basic reference. As shown in the figure: 1) the best NMI is always obtained when the two terms are properly balanced; 2) the proposed algorithm outperforms the best comparison algorithm on most of the data sets regardless of the variation of A.
Another important parameter of this embodiment is the number of representative points s.
We select the number of representative points in the range of [2k, 4k, ..., 14k], where k is the number of clusters, and conduct experiments, and the results are shown in Figure 3. It can be seen that with the increase of s, the clustering effect shows an overall increasing trend. However, a larger s will definitely bring higher computational overhead, and in order to balance the clustering effect and complexity, the number of representative points s = 8k can be selected empirically.
Another important parameter of this case is the number of representative points s. We select the number of representative points in the range of [2k, 4k,..., 14k], where k is the number of clusters, and conduct experiments, and the results are shown in Figure 3. With the increase of s, the clustering effect shows an overall increasing trend. However, a larger s will bring higher computational overhead, and in order to balance the clustering effect and complexity, the number of representative points s = 8k can be selected empirically.
The present embodiment also gives the changes in the objective function values and clustering performance at each iteration, as shown in Figure 4. It can be seen that the objective function values monotonically decrease and usually converge within 25 iterations. It can be seen that the clustering effect fluctuates as the objective function is reduced, but the overall trend is
BL-5569
LU502853 upward, and this example shows that the algorithm can continuously improve the clustering performance during the training process.
Embodiment III
The present embodiment provides a bipartite graph-based late fusion multi-view clustering machine learning system, as shown in Figure 5, comprising:
Acquisition module 11 for acquiring the clustering task and target data samples;
Running module 12 for running kernel k-mean clustering by running kernel k-mean clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the base division and calculate the diversity regular term; for each view
Build module 13 for building a bipartite graph-based post-fusion multi-view clustering objective function using random initialization of selected representative points of each view;
Solver module 14 for solving the established bipartite graph-based post-fusion multi-view clustering objective function using a round-robin approach to obtain the bipartite graph after view fusion;
Clustering module 15 for spectral clustering ofthe obtained bipartite graph to obtain clustering results.
Further, said running module runs kernel k-means clustering, denoted as:
The objective of kernel k-means clustering 1s to minimize the sum of squared errors based on the partition matrix B € {0,1}"*X, denoted as: ef, TE en Billo (x) — Hell2 SAXE 1 Bie = 41)
Where, {x;}L, € X denotes the data set consisting of n samples; ¢(-):x € X + H denotes the feature mapping that projects the sample x to a regenerative kernel Hilbert space H; pu. = (1/0) HE, Bie®(xi), Ne = XL; Bic denotes the number of samples belonging to the c-th cluster number of samples, 1 < c < k; i denotes the sample serial number; when the i-th sample belongs to the c-th cluster, Bi. = 1, otherwise, Bic = 0. Equation (1) is reduced to: min Tr(K) — Tr (L2BTKBL?) s.t.B1x = 1n(2)
Ze(0,1}1*k
Where, K denotes the kernel matrix, the elements of K are K; = dx) TP (xj) ,L= diag([n] 1. ng 1), and € R® denotes the vector whose all elements are 1. 16
BL-5569 1 LU502853
Let H = BLz and convert the discrete constraint to a real-valued orthogonal constraint, i.e.,
HTH = I, then equation (2) is converted to: „min Tr(K(l, — HH"))
Where, I denotes the k-dimensional unit matrix.
Further, said post-fusion multi-view clustering objective function based on the bipartite graph in the establishment module, expressed as: yA Zp VEIlZ — Hp AD; +Ay"My 3) s.t.Z1, = 1, Z > 0, y"1m = Ly>0
Where,H, € R"*K(p € {1,2,...,m}) denotes the base partition of each view obtained from the kernel k-means clustering; A, € RS*K(p € {1,2,...,m}) denotes the representative points of each view; Z € R"*S is the bipartite graph after view fusion; n,k,s denote the number of samples, clusters and representative points, respectively; À denotes the regularization parameter; y denotes the combination coefficient of each view, M denotes the view diversity regularization term with the element Mpq = tr(Hp Hy); m denotes the number of views.
Further, the specific objective function of the bipartite graph-based post-fusion multi-view clustering established by solving in a cyclic manner in the said solution module is:
Using the three-step alternating method to solve equation (3), specifically includes:
The first fixation module for fixing y and {Ap}p=1, optimizing Z;
Let the i-th row of Z be z;, which is denoted as: min||z; — cill2, s.tz;=>01z1,=1 2 i 2
Where, c; = XpL, Y 2, [ZB Y > zP) is the i-th row of the matrix HpA};
The second fixation module for fixing y and Z and optimizing {Ap}p=1, using such that the objective function is biased with respectto Ap equal to 0, to obtain the closed-form solution A, =
Z Hp (Hp Hp) 7;
The third stationary module, used to fix {Ap}p=1 and Z, optimizes y and transforms the objective function into a quadratic programming problem with linear constraints, denoted as: 17
BL-5569
LU502853 min y (AM+T) 7, sty 1,=17 >0
Where, T = diag (||Z — HAL, — IZ — HAL I).
Further, said solving module utilizes the three-step alternating method to solve equation (3), where the three-step alternating method termination condition is denoted as: (objt=D — obj‘) Jobj® < €
Where, obj-1, obj denote the values of Eq. (3) for the t-th and t — 1-th iteration respectively, and € denotes the setting accuracy.
It is to be noted that this embodiment provides a bipartite graph based late fusion multi-view clustering machine learning system similar to embodiment one and will not be further described herein.
Compared with the prior art, this embodiment includes modules for obtaining the base clustering division and calculating the graph diversification regular terms, optimizing the objective function to obtain the bipartite graph and using the bipartite graph for clustering. By optimizing the representative points, the present embodiment makes the optimized representative points not only represent the information of individual views, but also better serve the view fusion, which in turn enables the learned bipartite graph to better fuse the information of individual views and achieve the purpose of improved clustering effect.
Note that the above is only a preferred embodiment of the present application and the technical principles applied. It will be understood by those skilled in the art that this application is not limited to the particular embodiments described herein, and that various variations, readjustments, and substitutions are apparent to those skilled in the art without departing from the scope of protection of this application. Therefore, although the present application is described in some detail by the above embodiments, the present application is not limited to the above embodiments, but can include more other equivalent embodiments without departing from the conception of the present application, and the scope of the present application is determined by the scope of the appended claims. 18

Claims (10)

BL-5569 LU502853 Claims
1. Apost-fusion multi-view clustering machine learning method based on the bipartite graph, characterized in that it comprises:
S1. Acquiring clustering task and target data samples;
S2. Run the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the base partition and calculate the diversity regularity term for each view;
S3. Post-fusion multi-view clustering objective function based on bipartite graphs; using random initialization to select representative points for each view
S4. Solving the established post-fusion multi-view clustering objective function based on the bipartite graph using a circular approach to obtain the bipartite graph after view fusion;
SS. Spectral clustering is performed on the obtained bipartite graph to obtain the clustering results.
2. The bipartite graph-based late-fusion multi-view clustering machine learning method according to claim 1, is characterized in that said step S2 runs kernel k-mean clustering, specifically: The objective of the kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B € {0,1}™X, denoted as: sein Yes Biel p(x) — ell s.t LE, Bie = 1 (1) Where, {x;}]., € X denotes a data set consisting of n samples; b(-):x € X = H denotes a feature mapping that projects sample x to a regenerative kernel Hilbert space H; ju. = (1/n) EE, Bic d(xi),n. = XL, Bicdenotes the number of samples belonging to the c-th cluster number of samples, 1< c < k; i denotes the sample serial number; Bj. = 1 when the i-th sample belongs to the c-th cluster, otherwise, Bj. = 0; Equation (1) is reduced to: min Tr(K) — Tr (LzBTKBL?) s.t.B1p = 1, (2) Ze(0,1}1*k Where, K denotes the kernel matrix, the elements of K are Kj = dx) TP (xj) , L= diag([n7%, ..., ng*]), and 1, € RS denotes the vector with all elements being 1; Let H = BL: and convert the discrete constraint to a real-valued orthogonal constraint, i.e., 19
BL-5569 HTH = I, then equation (2) converts to: 10906058 „min Tr(K(In — HH")) Where, I; denotes a k-dimensional unit matrix.
3. The bipartite graph-based post-fusion multi-view clustering machine learning method according to claim 2, characterized in that said step S3 in the bipartite graph-based post-fusion multi-view clustering objective function, expressed as: yA Zp VEIlZ — Hp AD; +Ay"My 3)
s.t.Z1, = 1, Z > 0, y"1m = Ly>0 Where, H, € R"<K(p € {1,2,...,m}) denotes the base partition of each view obtained from the kernel k-means clustering; A, € RS*K(p € {1,2,...,m}) denotes the representative points of each view; Z € R"*S is the bipartite graph after view fusion; n,k,s denote the number of samples, clusters and representative points, respectively; À denotes the regularization parameter; y denotes the combination coefficient of each view, M denotes the view diversity regularization term with the element Mpq = tr(Hp Hy); m denotes the number of views.
4. The bipartite graph-based post-fusion multi-view clustering machine learning method according to claim 3, characterized in that said step S4 uses a circular approach to solve the established bipartite graph-based post-fusion multi-view clustering objective function, expressed as: Equation (3) is solved using the three-step alternating method, as:
Al. fixing y and {Ap}p=1, optimizingZ; Let the i-th row of Z be z;, which is denoted as: min||z; —ql%, stz;=01zl1,=1 Where, ¢; = XL, Y 2, ® IXY ‘a is the i-th row of the matrix H,AT: > Ci p=1 Ÿ Zi p=1 Y Zi pip;
A2. fixing y and Z, optimizing {Ap}p=1, and adopting such that the objective function is biased with respect to Ap equal to 0 to obtain the closed-form solution Ap = Z"H, (Hp Hp)™t ;
A3. Fixing {Ap}p=1 and Z, optimizing y, the objective function is transformed into a quadratic programming problem with linear constraints, denoted as:
BL-5569 LU502853 min Ÿ (AM+T) y, sty 1.=17 20 Y Where, T = diag (||Z — H,AÏ||> — IZ — HAL I).
5. The bipartite graph-based post-fusion multi-view clustering machine learning method according to claim 4, characterized in that said step S4 utilizes a three-step alternating method to solve equation (3), wherein the three-step alternating method termination condition is expressed as: (objt=1 — obj‘®) /obj® < € Where, objt-", obj® denote the values of Eq. (3) for the t-th and t — 1-th iteration, respectively, and € denotes the setting accuracy.
6. A post-fusion multi-view clustering machine learning system based on a bipartite graph, characterized in that it comprises: Acquisition module for acquiring a clustering task and a target data sample; Run module for running kernel k-means clustering by running kernel k-means clustering on each view corresponding to the obtained clustering task and target data samples to obtain the base division and calculate the diverse regular terms for each view; Build module for building a post-fusion multi-view clustering objective function based on a bipartite graph using random initialization to select representative points for each view; Solving module for solving the established post-fusion multi-view clustering objective function based on the bipartite graph in a cyclic manner to obtain the bipartite graph after view fusion; Clustering module, which is used to perform spectral clustering on the obtained bipartite graphs and obtain the clustering results.
7. The bipartite graph-based post-fusion multi-view clustering machine learning system according to claim 6, characterized in that said runtime module runs kernel k-mean clustering, expressed as: The objective of the kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B € {0,1}™X, denoted as: penn TE en Biel Pi) — Hell st Boa Bie = 1 (1) Where, {x;}}L, = X denotes a data set consisting of n samples; ¢(-):x € X —> H denotes 21
BL-5569 LU502853 a feature mapping that projects sample x to a regenerative kernel Hilbert space H; y, = (1/0) HE, BicHy),ne = XL, Bj. denotes the number of samples belonging to the c-th cluster number of samples, < c < Kk; i denotes the sample serial number; Bj. = 1 when the i-th sample belongs to the c-th cluster, otherwise, Bj. = 0; Equation (1) is reduced to: min Tr(K) — Tr (LzBTKBL?) s.t.B1x = 1n(2) Ze(0,1}1*k Where, K denotes the kernel matrix, the elements of K are Kj = dx) P(x), L= diag([n7%, ..., ng*]), and 1, € RS denotes the vector with all elements being 1; Let H = BL: and convert the discrete constraint to a real-valued orthogonal constraint, i.e., HTH = I, then equation (2) converts to: „min Tr(K(l, — HH")) Where, I, denotes a k-dimensional unit matrix.
8. The bipartite graph-based post-fusion multi-view clustering machine learning system according to claim 7, characterized in that said post-fusion multi-view clustering objective function based on the bipartite graph in the establishment module, expressed as: yin, Dp=1 VIIZ — HpASIIE + ay" My © s.t.Z1, = 1, Z > 0, y"1m = Ly>0 Where,H, € R"*K(p € {1,2,...,m}) denotes the base partition of each view obtained from kernelk-means clustering; A, € RS*K(p € {1,2,...,m}) denotes the representative points of each view; Z € R"*S is the bipartite view fusion graph; n,k,s denote the number of samples, the number of clusters and the number of representative points, respectively; À denotes the regularization parameter; y denotes the combination coefficient of each view, M denotes the view diversity regularization term with the element Mpq = tr(Hp Hy); and m denotes the number of views.
9. According to claim 8, the post-fusion multi-view clustering machine learning system based on bipartite graphs, characterized in that said solving module uses a cyclic approach to solve 22
BL-5569 LU502853 the established post-fusion multi-view clustering objective function based on bipartite graphs, expressed as: Equation (3) is solved by using the three-step alternating method, as: The first fixation module for fixing y and {Ap}p=1, optimizing Z; Let the i-th row of Z be z;, which is denoted as: min||z; —ql%, stz;=01zl1,=1 i Where, c; = XpL, Y zZ [ZB Y ; zP) is the i-th row of the matrix HpA}; The second fixation module for fixing y and Z and optimizing {Ap}p=1, using such that the objective function is biased with respectto Ap equal to 0, to obtain the closed-form solution A, = Z Hp (Hp Hp) 7; The third stationary module, used to fix {Ap}p=1 and Z, optimize y and transform the objective function into a quadratic programming problem with linear constraints, denoted as: min y (AM+T) 7, sty 1,=17 >0 Where, T = diag (||Z — Hy Al}, .. IZ — HAL IZ).
10. The bipartite graph-based post-fusion multi-view clustering machine learning system according to claim -, characterized in that said solution module utilizes a three-step alternating method to solve equation (3), wherein the three-step alternating method termination condition is expressed as: (objt=D — obj‘) Jobj® < € Where, obj-1, obj denote the values of Eq. (3) for the t-th and t — 1-th iteration respectively, and € denotes the setting accuracy. 23
LU502853A 2021-02-09 2021-12-08 Bipartite graphs based post-fusion multi-view clustering machine learning methods and systems LU502853B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110173493.9A CN112990265A (en) 2021-02-09 2021-02-09 Post-fusion multi-view clustering machine learning method and system based on bipartite graph

Publications (1)

Publication Number Publication Date
LU502853B1 true LU502853B1 (en) 2023-01-30

Family

ID=76347689

Family Applications (1)

Application Number Title Priority Date Filing Date
LU502853A LU502853B1 (en) 2021-02-09 2021-12-08 Bipartite graphs based post-fusion multi-view clustering machine learning methods and systems

Country Status (4)

Country Link
CN (1) CN112990265A (en)
LU (1) LU502853B1 (en)
WO (1) WO2022170840A1 (en)
ZA (1) ZA202207736B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990265A (en) * 2021-02-09 2021-06-18 浙江师范大学 Post-fusion multi-view clustering machine learning method and system based on bipartite graph
CN113627237A (en) * 2021-06-24 2021-11-09 浙江师范大学 Late-stage fusion face image clustering method and system based on local maximum alignment
CN113610103A (en) * 2021-06-24 2021-11-05 浙江师范大学 Medical data clustering method and system based on unified anchor point and subspace learning
CN113627462A (en) * 2021-06-24 2021-11-09 浙江师范大学 Medical data clustering method and system based on matrix decomposition and multi-partition alignment
CN113837218A (en) * 2021-08-17 2021-12-24 浙江师范大学 Text clustering method and system based on one-step post-fusion multi-view
CN116152269A (en) * 2021-11-19 2023-05-23 华为技术有限公司 Bipartite graph construction method, bipartite graph display method and bipartite graph construction device
CN117009838B (en) * 2023-09-27 2024-01-26 江西师范大学 Multi-scale fusion contrast learning multi-view clustering method and system
CN117292162B (en) * 2023-11-27 2024-03-08 烟台大学 Target tracking method, system, equipment and medium for multi-view image clustering

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11709855B2 (en) * 2019-07-15 2023-07-25 Microsoft Technology Licensing, Llc Graph embedding already-collected but not yet connected data
CN112132224A (en) * 2020-09-28 2020-12-25 广东工业大学 Rapid spectrum embedding clustering method based on graph learning
CN112287974B (en) * 2020-09-28 2024-05-28 北京工业大学 Multi-view K multi-mean image clustering method based on self-adaptive weight
CN112990265A (en) * 2021-02-09 2021-06-18 浙江师范大学 Post-fusion multi-view clustering machine learning method and system based on bipartite graph

Also Published As

Publication number Publication date
CN112990265A (en) 2021-06-18
WO2022170840A1 (en) 2022-08-18
ZA202207736B (en) 2022-07-27

Similar Documents

Publication Publication Date Title
LU502853B1 (en) Bipartite graphs based post-fusion multi-view clustering machine learning methods and systems
Raghu et al. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability
D'Urso et al. GARCH-based robust clustering of time series
EP3188111A1 (en) A method for extracting latent context patterns from sensors
Cavoretto et al. Partition of unity methods for signal processing on graphs
CN105718999B (en) A kind of construction method and system of heuristic metabolism coexpression network
Joneidi et al. E-optimal sensor selection for compressive sensing-based purposes
WO2022253153A1 (en) Later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement
Xu et al. Predicting alzheimer’s disease cognitive assessment via robust low-rank structured sparse model
WO2022227956A1 (en) Optimal neighbor multi-kernel clustering method and system based on local kernel
Zhou et al. Clustering multivariate time series data via multi-nonnegative matrix factorization in multi-relational networks
Wang et al. Multi-manifold clustering
Fernandes et al. The initialization and parameter setting problem in tensor decomposition-based link prediction
Zhang et al. A spectral clustering based method for hyperspectral urban image
Ding et al. Higher‐order sliced inverse regressions
Thakre et al. Intrinsic dimensionality of microstructure data
Karas et al. Brain connectivity-informed regularization methods for regression
Aggarwal et al. Spatio-temporal frequent itemset mining on web data
Nguyen et al. Cadis: Handling cluster-skewed non-iid data in federated learning with clustered aggregation and knowledge distilled regularization
Stegeman Simultaneous component analysis by means of Tucker3
Teisseyre et al. Random Subspace Method for high-dimensional regression with the R package regRSM
US11386335B2 (en) Systems and methods providing evolutionary generation of embeddings for predicting links in knowledge graphs
Karmakar et al. Statistical validity and consistency of big data analytics: a general framework
Blanchet et al. A model-based approach to gene clustering with missing observation reconstruction in a Markov random field framework
Yang et al. Clustering Unsynchronized Time Series Subsequences with Phase Shift Weighted Spherical k-means Algorithm.

Legal Events

Date Code Title Description
FG Patent granted

Effective date: 20230130