WO2022253153A1 - 基于代理图改善的后期融合多核聚类机器学习方法及系统 - Google Patents

基于代理图改善的后期融合多核聚类机器学习方法及系统 Download PDF

Info

Publication number
WO2022253153A1
WO2022253153A1 PCT/CN2022/095836 CN2022095836W WO2022253153A1 WO 2022253153 A1 WO2022253153 A1 WO 2022253153A1 CN 2022095836 W CN2022095836 W CN 2022095836W WO 2022253153 A1 WO2022253153 A1 WO 2022253153A1
Authority
WO
WIPO (PCT)
Prior art keywords
clustering
matrix
graph
kernel
expressed
Prior art date
Application number
PCT/CN2022/095836
Other languages
English (en)
French (fr)
Inventor
朱信忠
徐慧英
李苗苗
梁伟轩
殷建平
赵建民
Original Assignee
浙江师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江师范大学 filed Critical 浙江师范大学
Publication of WO2022253153A1 publication Critical patent/WO2022253153A1/zh
Priority to ZA2023/11513A priority Critical patent/ZA202311513B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts

Definitions

  • the present application relates to the technical field of machine learning, and in particular to a later fusion multi-core clustering machine learning method and system based on agent graph improvement.
  • Clustering plays an important role in machine learning and data analysis, and its goal is to divide unlabeled data into several unrelated classes. In the era of big data, data is collected from multiple sources, and this type of data is called multi-view data. Methods for clustering multi-view data are known as multi-view clustering algorithms. Multi-kernel clustering algorithm is an important branch of multi-view clustering, which aims to make full use of a series of predefined base kernels to improve clustering performance.
  • the existing multi-kernel clustering algorithms can be roughly divided into two types: early fusion and late fusion according to the timing of fusion.
  • Early fusion refers to the fusion of several kernel matrices before performing the kernel k-means algorithm.
  • the method of regularization term induced by matrix (X.Liu, Y.Dou, J.Yin, et al. "Multiple kernel k-means clustering with matrix-induced regularization", in AAAI 2016, pp.1888–1894 ) can adaptively adjust the kernel coefficients according to the similarity of the kernel matrix, avoiding the redundancy of similar information, thus improving the quality of the optimal kernel matrix.
  • a method to preserve the local structure of the nucleus M. and AA Margolin, "Localized data fusion for kernel k-means clustering with application to cancer biology", in NeurIPS 2014, pp.1305-1313) can also improve the effect of the algorithm.
  • the kernel k-means algorithm is firstly performed on the base kernel matrix to obtain the basic divisions, and then these basic divisions are fused.
  • the late fusion algorithm based on maximum alignment (S.Wang, X.Liu, E.Zhu, et al.Multi-view clustering via late fusion alignment maximization, in IJCAI 2019, pp.3778–3784) enables the basic division to achieve Align the effects before combining them.
  • the late fusion method proposed by Liu et al. (X.Liu, M.Li, C.Tang, et al. Efficient and effective regularized incomplete multi-view clustering, in T-PAMI 2020) can deal with incomplete view data and obtain It has a good clustering effect.
  • the existing post-fusion clustering algorithms still have the following shortcomings: First, the clustering process of the basic kernel and the post-fusion process of the basic partition are separated. In this case, the quality of the basic division has a great influence on the performance of the final clustering. If there are outliers and noises in it, the clustering effect will be unsatisfactory. The second is that the existing methods simply regard the consistent partition as a linear transformation of the basic partition, making it difficult to apply to multi-core data in reality.
  • the purpose of this application is to address the defects of the prior art, and provide an improved late fusion multi-core clustering machine learning method and system based on proxy graphs.
  • a post-fusion multi-core clustering machine learning method based on agent graph improvement including steps:
  • step S4 Solving the objective function constructed in step S3 in a cyclic manner to obtain a graph matrix of fusion basic nuclear information
  • n c represents the number of samples belonging to the c-th cluster
  • x i represents the data sample
  • i represents the sample number
  • n represents the number of sample points
  • k represents the total number of clusters.
  • K represents the kernel matrix
  • 1 k ⁇ R k represents a vector with all elements being 1
  • BT represents the transpose of B.
  • HT represents the transpose of H
  • I n represents the n-dimensional identity matrix
  • I k represents the k-dimensional identity matrix.
  • H i represents the basic partition matrix obtained from the i-th running kernel k-means clustering; ⁇ and ⁇ represent the hyperparameters for adjusting the proportion of each item; Denoted as the transpose of Hi ; S represents the proxy graph matrix; In represents the n-dimensional identity matrix.
  • the objective function constructed in the step S3 is solved in a cyclic manner, specifically:
  • S j represents the jth column of matrix S; ⁇ j represents the intermediate variable for solving; express column j of express transpose.
  • step S3 the objective function constructed in step S3 is solved in a cyclic manner, wherein the terminating condition of the loop is:
  • obj (t-1) and obj (t) represent the value of the objective function at the t-th and t-1 iterations respectively; ⁇ represents the set precision.
  • a post-fusion multi-core clustering machine learning system based on agent graph improvement including:
  • Obtaining module used for obtaining clustering tasks and target data samples
  • the initialization module is used to initialize the agent graph improvement matrix
  • the solution module is used to solve the constructed objective function in a cyclic manner to obtain a graph matrix fused with basic kernel information
  • the clustering module is used to perform spectral clustering on the obtained graph matrix to obtain the final clustering result.
  • n c represents the number of samples belonging to the c-th cluster
  • x i represents the data sample
  • i represents the sample number
  • n represents the number of sample points
  • k represents the total number of clusters.
  • K represents the kernel matrix
  • 1 k ⁇ R k represents a vector with all elements being 1
  • BT represents the transpose of B.
  • HT represents the transpose of H
  • I n represents the n-dimensional identity matrix
  • I k represents the k-dimensional identity matrix.
  • H i represents the basic partition matrix obtained from the i-th running kernel k-means clustering; ⁇ and ⁇ represent the hyperparameters for adjusting the proportion of each item; Denoted as the transpose of Hi ; S represents the proxy graph matrix; In represents the n-dimensional identity matrix.
  • the objective function constructed is solved in a cyclic manner, specifically:
  • the first fixed module used to fix S, optimizes Expressed as:
  • the second fixed module is fixed Optimizing S, expressed as:
  • S j represents the jth column of matrix S; ⁇ j represents the intermediate variable for solving; express column j of express transpose.
  • the constructed objective function is solved in a cyclic manner, wherein the terminating condition of the loop is:
  • obj (t-1) and obj (t) represent the value of the objective function at the t-th and t-1 iterations respectively; ⁇ represents the set precision.
  • this application proposes a novel proxy graph improved post-fusion multi-core clustering machine learning method, which includes obtaining the base partition, constructing the proxy graph, using the proxy graph to improve the base partition and using the proxy graph to perform Modules such as spectral clustering.
  • this application makes the optimized basic division not only have the information of a single core, but also obtain global information through the proxy graph, which is more conducive to the fusion of views, so that the learned proxy graph can be better
  • the information of each kernel matrix is fused to achieve the purpose of improving the clustering effect.
  • Fig. 1 is the flow chart of the post-fusion multi-core clustering machine learning method improved based on the proxy graph provided by Embodiment 1;
  • Fig. 2 is a schematic diagram of later fusion multi-core clustering based on agent graph improvement provided by Embodiment 1;
  • Fig. 3 is a schematic diagram of the variation of the objective function value as the number of iterations increases provided by Embodiment 2;
  • Fig. 4 is a schematic diagram of parameter sensitivity provided in Example 2.
  • the purpose of this application is to address the defects of the prior art, and provide an improved late fusion multi-core clustering machine learning method and system based on proxy graphs.
  • This embodiment provides an improved post-fusion multi-core clustering machine learning method based on proxy graphs, as shown in Figure 1-2, including steps:
  • step S4 Solving the objective function constructed in step S3 in a cyclic manner to obtain a graph matrix of fusion basic nuclear information
  • step S3 run k-means clustering and graph improvement on each view corresponding to the clustering task and the target data sample, and construct an objective function by combining kernel k-means clustering and graph improvement.
  • K represents the kernel matrix
  • 1 k ⁇ R k represents a vector with all elements being 1
  • BT represents the transpose of B.
  • HT represents the transpose of H
  • I n represents the n-dimensional identity matrix
  • I k represents the k-dimensional identity matrix.
  • eigendecomposition can be performed on the kernel matrix K, and the optimal H is the eigenvector corresponding to the first k largest eigenvalues of K.
  • H i represents the basic partition matrix obtained from the i-th running kernel k-means clustering; ⁇ and ⁇ represent the hyperparameters for adjusting the proportion of each item; Denoted as the transpose of Hi ; S represents the proxy graph matrix; In represents the n-dimensional identity matrix.
  • formula (5) can use S to adjust H i , the algorithm is named as Late Fusion Multi-kernel Clustering with Surrogate Graph Improvement.
  • step S4 the objective function constructed in step S3 is solved in a cyclic manner to obtain a graph matrix fused with basic kernel information.
  • the objective function can be solved using the following two-step iterative method, specifically:
  • S j represents the jth column of matrix S; ⁇ j represents the intermediate variable for solving; express column j of express transpose.
  • step S41, S42 alternate method termination condition
  • obj (t-1) and obj (t) represent the value of the objective function at the t-th and t-1 iterations respectively; ⁇ represents the set precision.
  • step S5 spectral clustering is performed on the obtained graph matrix to obtain the final clustering result.
  • the standard spectral clustering algorithm is performed on the output graph matrix S to obtain the final clustering result.
  • This embodiment proposes a novel post-fusion multi-core clustering machine learning method improved by proxy graphs.
  • the method includes modules such as obtaining basic partitions, constructing proxy graphs, using proxy graphs to improve basic partitions, and using proxy graphs for spectral clustering.
  • the optimized basic division not only has the information of a single core, but also obtains global information through the proxy graph, which is more conducive to the fusion of views, so that the learned proxy graph can better integrate each core.
  • the information of the kernel matrix achieves the purpose of improving the clustering effect.
  • the clustering performance of the method of the present application is tested on six MKL standard data sets.
  • the 6 MKL standard datasets include AR10P, YALE, Protein fold prediction, Oxford Flower17, Nonplant, Oxford Flower102.
  • Table 1 For information about the dataset, see Table 1.
  • this embodiment For ProteinFold, this embodiment generates 12 benchmark kernel matrices, in which the first 10 feature sets use the second-order polynomial kernel, and the last two use the cosine inner product kernel. Kernel matrices for other datasets are available for download from the Internet.
  • the optimal single-view kernel k-means clustering algorithm (BSKM), multi-kernel k-means clustering (MKKM), co-regularized spectral clustering (CRSC), robust multi-kernel clustering (RMKKM), robust multi-kernel clustering View spectral clustering (RMSC), multikernel k-means clustering with matrix-induced regularization term (MKMR), local kernel maximal alignment based multikernel clustering (MKAM), late fusion based maximally aligned multi-view clustering (MLFA ) and subspace clustering based on flexible multi-view representation learning.
  • all benchmark kernels are first centered and regularized.
  • the number of classes is assumed to be known and set to the number of cluster classes.
  • the comparison algorithms used in this experiment all set parameters according to the corresponding literature.
  • the parameters ⁇ and ⁇ of this method are also determined by grid searching the range [2 ⁇ 2 ,2 ⁇ 1 ,...,2 2 ].
  • This experiment uses common clustering accuracy (ACC), normalized mutual information (NMI) and purity (Purity) to show the clustering performance of each method. All methods are randomly initialized and repeated 50 times and show the best results to reduce the randomness caused by k-means.
  • ACC common clustering accuracy
  • NMI normalized mutual information
  • Purity Purity
  • Table 2 shows the clustering effects of the above methods and comparison algorithms on the six data sets of different algorithms. According to the table, it can be observed that: 1. The proposed algorithm is superior to all compared algorithms under the three evaluation criteria. 2. The performance of the proposed algorithm on the six datasets ACC is 4.92%, 1.21%, 2.16%, 2.12%, 6.85% and 4.05% higher than the suboptimal comparison algorithm respectively.
  • This embodiment also gives the change of the objective function at each iteration, as shown in FIG. 3 . It can be seen that the value of the objective function decreases monotonously and usually converges within 10 iterations, which can greatly reduce the running time of the algorithm.
  • Figure 4 demonstrates the parameter sensitivity, taking two datasets such as AR10P and Flower17 as examples. It can be seen from the figure that the proposed algorithm is relatively stable for both hyperparameters and can achieve good performance in a wide range.
  • This embodiment provides an improved post-fusion multi-core clustering machine learning system based on proxy graphs, including:
  • Obtaining module used for obtaining clustering tasks and target data samples
  • the initialization module is used to initialize the agent graph improvement matrix
  • the solution module is used to solve the constructed objective function in a cyclic manner to obtain a graph matrix fused with basic kernel information
  • the clustering module is used to perform spectral clustering on the obtained graph matrix to obtain the final clustering result.
  • n c represents the number of samples belonging to the c-th cluster
  • x i represents the data sample
  • i represents the sample number
  • n represents the number of sample points
  • k represents the total number of clusters.
  • K represents the kernel matrix
  • 1 k ⁇ R k represents a vector with all elements being 1
  • BT represents the transpose of B.
  • HT represents the transpose of H
  • I n represents the n-dimensional identity matrix
  • I k represents the k-dimensional identity matrix.
  • H i represents the basic partition matrix obtained from the i-th running kernel k-means clustering; ⁇ and ⁇ represent the hyperparameters for adjusting the proportion of each item; Denoted as the transpose of Hi ; S represents the proxy graph matrix; In represents the n-dimensional identity matrix.
  • the objective function constructed is solved in a cyclic manner, specifically:
  • the first fixed module used to fix S, optimizes Expressed as:
  • the second fixed module is fixed Optimizing S, expressed as:
  • S j represents the jth column of matrix S; ⁇ j represents the intermediate variable for solving; express column j of express transpose.
  • the constructed objective function is solved in a cyclic manner, wherein the terminating condition of the loop is:
  • obj (t-1) and obj (t) represent the value of the objective function at the t-th and t-1 iterations respectively; ⁇ represents the set precision.
  • the system proposed in this embodiment includes modules such as obtaining basic partitions, constructing proxy graphs, using proxy graphs to improve basic partitions, and using proxy graphs for spectral clustering.
  • the optimized basic division not only has the information of a single core, but also obtains global information through the proxy graph, which is more conducive to the fusion of views, so that the learned proxy graph can better integrate each core.
  • the information of the kernel matrix achieves the purpose of improving the clustering effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Discrete Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于代理图改善的后期融合多核聚类机器学习方法及系统。其中涉及的基于代理图改善的后期融合多核聚类机器学习方法,包括步骤:S1.获取聚类任务和目标数据样本;S2.初始化代理图改善矩阵;S3.对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;S4.采用循环方式求解步骤S3中构建的目标函数,得到融合基础核信息的图矩阵;S5.对得到的图矩阵进行谱聚类,得到最终的聚类结果。上述方法使得经过优化后的基础划分不但拥有单个核的信息,还能通过代理图获得全局信息,更有利于视图的融合,从而使得学习到的代理图能够更好地融合各个核矩阵的信息,达到聚类效果提升的目的。

Description

基于代理图改善的后期融合多核聚类机器学习方法及系统 技术领域
本申请涉及机器学习技术领域,尤其涉及基于代理图改善的后期融合多核聚类机器学习方法及系统。
背景技术
聚类在机器学习和数据分析中有重要的地位,它的目标是将无标签的数据划分为若干个不相关的类。在大数据时代,数据的收集是多源的,这类数据被称为多视图数据。对多视图数据进行聚类的方法被称为多视图聚类算法。多核聚类算法是多视图聚类中的重要分支,它旨在充分利用一系列预先定义的基核,用以提高聚类表现。
现有的多核聚类算法根据融合的时机不同,可以大致分为前期融合和后期融合等两类。前期融合,是指在进行核k均值算法之前,将若干个核矩阵进行融合。其中,由矩阵诱导的正则化项的方法(X.Liu,Y.Dou,J.Yin,et al.“Multiple kernel k-means clustering with matrix-induced regularization”,in AAAI 2016,pp.1888–1894)能够根据核矩阵的相似度自适应地调整核系数,避免相似信息的冗余,从而提高了最优核矩阵的质量。保持核的局部结构的方法(M.
Figure PCTCN2022095836-appb-000001
and A.A.Margolin,“Localized data fusion for kernel k-means clustering with application to cancer biology”,in NeurIPS 2014,pp.1305-1313)亦能提高算法的效果。
后期融合多核聚类则是先对基核矩阵分别进行核k均值算法,得到基础划分,再将这些基础划分进行融合。基于最大对齐的后期融合算法(S.Wang,X.Liu,E.Zhu,et al.Multi-view clustering via late fusion alignment maximization,in IJCAI 2019,pp.3778–3784)通过置换矩阵使得基础划分达到对齐的效果,而后再将其进行组合。刘等人提出的后期融合方法(X.Liu,M.Li,C.Tang,et al.Efficient and effective regularized incomplete multi-view clustering,in T-PAMI 2020)则可以处理视图不完整的数据,取得了良好的聚类效果。
相比于前期融合,后期融合拥有非常低的计算和存储复杂度,以及较理想的聚类表现。然而,现有的后期融合聚类算法尚存在以下不足:一是基础核的聚类过程和基础划分的后期融合过程是分离的。在这种情况下,基础划分的质量对最终聚类的表现影响非常大,若其中存在异常点和噪声,将导致聚类效果不理想。二是现有的方法只是简单地将一致划分视作基础划分的线性转换,使得其难以应用于现实中的多核数据。
发明内容
本申请的目的是针对现有技术的缺陷,提供了基于代理图改善的后期融合多核聚类机器学习方法及系统。
为了实现以上目的,本申请采用以下技术方案:
基于代理图改善的后期融合多核聚类机器学习方法,包括步骤:
S1.获取聚类任务和目标数据样本;
S2.初始化代理图改善矩阵;
S3.对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;
S4.采用循环方式求解步骤S3中构建的目标函数,得到融合基础核信息的图矩阵;
S5.对得到的图矩阵进行谱聚类,得到最终的聚类结果。
进一步的,所述步骤S3中核k均值聚类的目标函数表示为:
Figure PCTCN2022095836-appb-000002
其中,
Figure PCTCN2022095836-appb-000003
为由n个样本组成的数据集;B∈{0,1} n×k表示聚类指示矩阵,若第i个样本属于第c个簇,则B ic=1,否则,B ic=0;
Figure PCTCN2022095836-appb-000004
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2022095836-appb-000005
的特征映射;
Figure PCTCN2022095836-appb-000006
n c代表属于第c个簇的样本个数;x i表示数据样本;i表示样本序号;n表示样本点个数;k表示聚类簇的总数。
令<φ(x i),φ(x j)>=K ij,其中K ij表示核矩阵K的元素,则公式(1)表示为:
Figure PCTCN2022095836-appb-000007
其中,K表示核矩阵;
Figure PCTCN2022095836-appb-000008
Figure PCTCN2022095836-appb-000009
表示属于第k个簇的样本总数的倒数;1 k∈R k表示所有元素都为1的向量;B T表示B的转置。
Figure PCTCN2022095836-appb-000010
且H TH=I k,则公式(2)表示为:
Figure PCTCN2022095836-appb-000011
其中,H T表示H的转置;I n表示n维单位矩阵;I k表示k维单位矩阵。
进一步的,所述步骤S3中构建的目标函数,表示为:
Figure PCTCN2022095836-appb-000012
Figure PCTCN2022095836-appb-000013
其中,H i表示对第i个运行核k均值聚类得到的基础划分矩阵;λ和β表示调整各项占比的超参数;
Figure PCTCN2022095836-appb-000014
表示为H i的转置;S表示代理图矩阵;I n表示n维单位矩阵。
进一步的,所述步骤S4中采用循环方式求解步骤S3中构建的目标函数,具体为:
S41.固定S,优化
Figure PCTCN2022095836-appb-000015
表示为:
Figure PCTCN2022095836-appb-000016
令G=K i-λ(I n-2S+SS T),则公式(7)表示为:
Figure PCTCN2022095836-appb-000017
对G进行特征分解,令H i为其前k个最大特征值对应的特征向量,即可得最优解;
S42.固定
Figure PCTCN2022095836-appb-000018
优化S,表示为:
Figure PCTCN2022095836-appb-000019
通过步骤S421、S422求解公式(9):
S421.求解出公式(9)无约束的解,表示为:
Figure PCTCN2022095836-appb-000020
利用导数为0,求得闭式解
Figure PCTCN2022095836-appb-000021
其中
Figure PCTCN2022095836-appb-000022
S422.通过公式(11)求距离
Figure PCTCN2022095836-appb-000023
最近的符合约束的解:
Figure PCTCN2022095836-appb-000024
其中,
Figure PCTCN2022095836-appb-000025
表示无约束时代理图矩阵的解。
求得闭式解:
Figure PCTCN2022095836-appb-000026
其中,S j,:表示矩阵S的第j列;α j表示用于求解的中间变量;
Figure PCTCN2022095836-appb-000027
表示
Figure PCTCN2022095836-appb-000028
的第j列;
Figure PCTCN2022095836-appb-000029
表示
Figure PCTCN2022095836-appb-000030
的转置。
进一步的,所述采用循环方式求解步骤S3中构建的目标函数,其中循环终止条件为:
Figure PCTCN2022095836-appb-000031
其中,obj (t-1)、obj (t)分别表示第t和t-1次迭代时目标函数的值;ε表示设定精度。
相应的,还提供基于代理图改善的后期融合多核聚类机器学习系统,包括:
获取模块,用于获取聚类任务和目标数据样本;
初始化模块,用于初始化代理图改善矩阵;
构建模块,用于对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;
求解模块,用于采用循环方式求解构建的目标函数,得到融合基础核信息的图矩阵;
聚类模块,用于对得到的图矩阵进行谱聚类,得到最终的聚类结果。
进一步的,所述构建模块中核k均值聚类的目标函数表示为:
Figure PCTCN2022095836-appb-000032
其中,
Figure PCTCN2022095836-appb-000033
为由n个样本组成的数据集;B∈{0,1} n×k表示聚类指示矩阵,若第i个样本属于第c个簇,则B ic=1,否则,B ic=0;
Figure PCTCN2022095836-appb-000034
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2022095836-appb-000035
的特征映射;
Figure PCTCN2022095836-appb-000036
n c代表属于第c个簇的样本个数;x i表示数据样本;i表示样本序号;n表示样本点个数;k表示聚类簇的总数。
令<φ(x i),φ(x j)>=K ij,其中K ij表示核矩阵K的元素,则公式(1)表示为:
Figure PCTCN2022095836-appb-000037
其中,K表示核矩阵;
Figure PCTCN2022095836-appb-000038
Figure PCTCN2022095836-appb-000039
表示属于第k个簇的样本总数的倒数;1 k∈R k表示所有元素都为1的向量;B T表示B的转置。
Figure PCTCN2022095836-appb-000040
且H TH=I k,则公式(2)表示为:
Figure PCTCN2022095836-appb-000041
其中,H T表示H的转置;I n表示n维单位矩阵;I k表示k维单位矩阵。
进一步的,所述构建模块中构建的目标函数,表示为:
Figure PCTCN2022095836-appb-000042
Figure PCTCN2022095836-appb-000043
其中,H i表示对第i个运行核k均值聚类得到的基础划分矩阵;λ和β表示调整各项占比的超参数;
Figure PCTCN2022095836-appb-000044
表示为H i的转置;S表示代理图矩阵;I n表示n维单位矩阵。
进一步的,所述求解模块中采用循环方式求解构建的目标函数,具体为:
第一固定模块,用于固定S,优化
Figure PCTCN2022095836-appb-000045
表示为:
Figure PCTCN2022095836-appb-000046
令G=K i-λ(I-2S+SS T),则公式(7)表示为:
Figure PCTCN2022095836-appb-000047
对G进行特征分解,令H i为其前k个最大特征值对应的特征向量,即可得最优解;
第二固定模块固定
Figure PCTCN2022095836-appb-000048
优化S,表示为:
Figure PCTCN2022095836-appb-000049
求解公式(9):
求解出公式(9)无约束的解,表示为:
Figure PCTCN2022095836-appb-000050
利用导数为0,求得闭式解
Figure PCTCN2022095836-appb-000051
其中
Figure PCTCN2022095836-appb-000052
求距离
Figure PCTCN2022095836-appb-000053
最近的符合约束的解:
Figure PCTCN2022095836-appb-000054
其中,
Figure PCTCN2022095836-appb-000055
表示无约束时代理图矩阵的解。
求得闭式解:
Figure PCTCN2022095836-appb-000056
其中,S j,:表示矩阵S的第j列;α j表示用于求解的中间变量;
Figure PCTCN2022095836-appb-000057
表示
Figure PCTCN2022095836-appb-000058
的第j列;
Figure PCTCN2022095836-appb-000059
表示
Figure PCTCN2022095836-appb-000060
的转置。
进一步的,所述采用循环方式求解构建的目标函数,其中循环终止条件为:
Figure PCTCN2022095836-appb-000061
其中,obj (t-1)、obj (t)分别表示第t和t-1次迭代时目标函数的值;ε表示设定精度。
与现有技术相比,本申请提出了一种新颖的代理图改善的后期融合多核聚类机器学习方法,该方法包括获取基础划分、构建代理图、利用代理图改善基础划分和利用代理图进行谱聚类等模块。通过对基础划分进行优化,本申请使得经过优化后的基础划分不但拥有单个核的信息,还能通过代理图获得全局信息,更有利于视图的融合,从而使得学习到的 代理图能够更好地融合各个核矩阵的信息,达到聚类效果提升的目的。在六个多核数据集上的实验结果证明了本申请的性能优于现有的方法。
附图说明
图1是实施例一提供的基于代理图改善的后期融合多核聚类机器学习方法流程图;
图2是实施例一提供的基于代理图改善的后期融合多核聚类示意图;
图3是实施例二提供的随迭代次数增加,目标函数值的变化示意图;
图4是实施例二提供的参数敏感性示意图。
具体实施方式
以下通过特定的具体实例说明本申请的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本申请的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
本申请的目的是针对现有技术的缺陷,提供了基于代理图改善的后期融合多核聚类机器学习方法及系统。
实施例一
本实施例提供基于代理图改善的后期融合多核聚类机器学习方法,如图1-2所示,包括步骤:
S1.获取聚类任务和目标数据样本;
S2.初始化代理图改善矩阵;
S3.对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;
S4.采用循环方式求解步骤S3中构建的目标函数,得到融合基础核信息的图矩阵;
S5.对得到的图矩阵进行谱聚类,得到最终的聚类结果。
在步骤S3中,对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数。
核k均值聚类目标式如下:令
Figure PCTCN2022095836-appb-000062
为由n个样本组成的数据集,设核函数为κ(·,·), 根据再生核的性质,有κ(x,x′)=<φ(x),φ(x′)>,其中
Figure PCTCN2022095836-appb-000063
为将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2022095836-appb-000064
的特征映射。将φ(x)代入k均值聚类的目标式中,得到核k均值聚类的目标函数,表示为:
Figure PCTCN2022095836-appb-000065
其中,B∈{0,1} n×k表示聚类指示矩阵,若第i个样本属于第c个簇,则B ic=1,否则,B ic=0;
Figure PCTCN2022095836-appb-000066
n c代表属于第c个簇的样本个数;x i表示数据样本;i表示样本序号;n表示样本点个数;k表示聚类簇的总数。
利用核技巧,令<φ(x i),φ(x j)>=K ij,其中K ij表示核矩阵K的元素,则公式(1)表示为:
Figure PCTCN2022095836-appb-000067
其中,K表示核矩阵;
Figure PCTCN2022095836-appb-000068
Figure PCTCN2022095836-appb-000069
表示属于第k个簇的样本总数的倒数;1 k∈R k表示所有元素都为1的向量;B T表示B的转置。
公式(2)关于B的优化已被证明是NP难的问题,所以将B的离散约束转换为实值正交约束,令
Figure PCTCN2022095836-appb-000070
且H TH=I k,则公式(2)表示为:
Figure PCTCN2022095836-appb-000071
其中,H T表示H的转置;I n表示n维单位矩阵;I k表示k维单位矩阵。
本实施例可以对核矩阵K进行特征分解,最优的H即为K前k个最大特征值对应的特征向量。
图改善部分的功能实现具体为:假设对第i个运行核k均值聚类得到的基础划分为H i,为了使得基础划分得到全局信息,可以通过最小化
Figure PCTCN2022095836-appb-000072
对基础划分进行调整,其中S为各基核共用的图矩阵,满足S≥0,S1=1,且对角线上元素为0。
联合核k均值聚类和图改善的方法构建目标函数,表示为:
Figure PCTCN2022095836-appb-000073
Figure PCTCN2022095836-appb-000074
其中,H i表示对第i个运行核k均值聚类得到的基础划分矩阵;λ和β表示调整各项占比的超参数;
Figure PCTCN2022095836-appb-000075
表示为H i的转置;S表示代理图矩阵;I n表示n维单位矩阵。
因为公式(5)可以利用S对H i进行调整,所以将算法命名为代理图改善的后期融合多核聚类。
在步骤S4中,采用循环方式求解步骤S3中构建的目标函数,得到融合基础核信息的图矩阵。
可以利用以下两步迭代法求解目标函数,具体为:
S41.固定S,优化
Figure PCTCN2022095836-appb-000076
对于每个H i,可以单独进行优化,表示为:
Figure PCTCN2022095836-appb-000077
令G=K i-λ(I n-2S+SS T),则公式(7)表示为:
Figure PCTCN2022095836-appb-000078
对G进行特征分解,令H i为其前k个最大特征值对应的特征向量,即可得最优解;
S42.固定
Figure PCTCN2022095836-appb-000079
优化S,此时优化问题可转化为如下形式,表示为:
Figure PCTCN2022095836-appb-000080
通过步骤S421、S422求解公式(9):
S421.求解出公式(9)无约束的解,表示为:
Figure PCTCN2022095836-appb-000081
利用导数为0,求得闭式解
Figure PCTCN2022095836-appb-000082
其中
Figure PCTCN2022095836-appb-000083
S422.通过公式(11)求距离
Figure PCTCN2022095836-appb-000084
最近的符合约束的解:
Figure PCTCN2022095836-appb-000085
其中,
Figure PCTCN2022095836-appb-000086
表示无约束时代理图矩阵的解。
求得闭式解:
Figure PCTCN2022095836-appb-000087
其中,S j,:表示矩阵S的第j列;α j表示用于求解的中间变量;
Figure PCTCN2022095836-appb-000088
表示
Figure PCTCN2022095836-appb-000089
的第j列;
Figure PCTCN2022095836-appb-000090
表示
Figure PCTCN2022095836-appb-000091
的转置。
上述两步(步骤S41、S42)交替法终止条件为:
Figure PCTCN2022095836-appb-000092
其中,obj (t-1)、obj (t)分别表示第t和t-1次迭代时目标函数的值;ε表示设定精度。
在步骤S5中,对得到的图矩阵进行谱聚类,得到最终的聚类结果。
对输出的图矩阵S进行标准的谱聚类算法,得到最终的聚类结果。
本实施例提出了一种新颖的代理图改善的后期融合多核聚类机器学习方法,该方法包括获取基础划分、构建代理图、利用代理图改善基础划分和利用代理图进行谱聚类等模块。通过对基础划分进行优化,使得经过优化后的基础划分不但拥有单个核的信息,还能通过代理图获得全局信息,更有利于视图的融合,从而使得学习到的代理图能够更好地融合各个核矩阵的信息,达到聚类效果提升的目的。
实施例二
本实施例提供的基于代理图改善的后期融合多核聚类机器学习方法与实施例一的不同之处在于:
本实施例在6个MKL标准数据集上测试了本申请方法的聚类性能。
6个MKL标准数据集包括AR10P、YALE、Protein fold prediction、Oxford Flower17、Nonplant、Oxford Flower102。数据集的相关信息参见表1。
Dataset Samples Kernels Clusters
AR10P 130 6 10
YALE 165 5 15
ProteinFold 694 12 27
Flower17 1360 7 17
Nonplant 2372 69 3
Flower102 8189 4 102
表1
对于ProteinFold,本实施例产生了12个基准核矩阵,其中前10特征集使用了二阶多项式核,最后两个使用了cosine内积核。其他数据集的核矩阵可从互联网下载。
本实验采用最优单视图核k均值聚类算法(BSKM)、多核k均值聚类(MKKM)、协同正则化谱聚类(CRSC)、鲁棒的多核聚类(RMKKM)、鲁棒的多视图谱聚类(RMSC)、带矩阵诱导正则化项的多核k均值聚类(MKMR)、基于局部核最大对齐的多核聚类(MKAM)、基于后期融合的最大化对齐多视图聚类(MLFA)和基于灵活的多视图表示学习的子空间聚类。在所有实验中,所有基准核首先被中心化和正则化。对于所有数据集,假设类别数量已知且被设置为聚类类别数量。本实验使用的对比算法均根据相应的文献设置参数。本方法的参数λ和β也通过网格搜索[2 -2,2 -1,…,2 2]的范围来确定。
本实验使用了常见的聚类准确度(ACC)、归一化互信息(NMI)和纯度(Purity)来显示每种 方法的聚类性能。所有方法随机初始化并重复50次并显示最佳结果以减少k均值造成的随机性。
Figure PCTCN2022095836-appb-000093
表2
表2展示了上述方法以及对比算法在六个数据集上不同算法的聚类效果。根据该表可以观察到:1.所提出的算法在三种评价标准下,均优于所有对比算法。2.所提出的算法在六 个数据集ACC上的表现要分别高于次优的对比算法达4.92%,1.21%,2.16%,2.12%,6.85%和4.05%。
本实施例也给出了每次迭代时的目标函数变化,如图3所示。可以看出目标函数值单调减少且通常在10次迭代之内即可收敛,这可以大大地减少算法运行的时间。
图4展示了参数敏感性,以AR10P和Flower17等两个数据集为示例。从图中可以看出,所提出的算法对于两个超参数都比较稳定,且在大范围内都能取得较好的性能。
本实施例在六个多核数据集上的实验结果证明了本申请的性能优于现有的方法。
实施例三
本实施例提供基于代理图改善的后期融合多核聚类机器学习系统,包括:
获取模块,用于获取聚类任务和目标数据样本;
初始化模块,用于初始化代理图改善矩阵;
构建模块,用于对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;
求解模块,用于采用循环方式求解构建的目标函数,得到融合基础核信息的图矩阵;
聚类模块,用于对得到的图矩阵进行谱聚类,得到最终的聚类结果。
进一步的,所述构建模块中核k均值聚类的目标函数表示为:
Figure PCTCN2022095836-appb-000094
其中,
Figure PCTCN2022095836-appb-000095
为由n个样本组成的数据集;B∈{0,1} n×k表示聚类指示矩阵,若第i个样本属于第c个簇,则B ic=1,否则,B ic=0;
Figure PCTCN2022095836-appb-000096
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2022095836-appb-000097
的特征映射;
Figure PCTCN2022095836-appb-000098
n c代表属于第c个簇的样本个数;x i表示数据样本;i表示样本序号;n表示样本点个数;k表示聚类簇的总数。
令<φ(x i),φ(x j)>=K ij,其中K ij表示核矩阵K的元素,则公式(1)表示为:
Figure PCTCN2022095836-appb-000099
其中,K表示核矩阵;
Figure PCTCN2022095836-appb-000100
Figure PCTCN2022095836-appb-000101
表示属于第k个簇的样本总数的倒数;1 k∈R k表示所有元素都为1的向量;B T表示B的转置。
Figure PCTCN2022095836-appb-000102
且H TH=I k,则公式(2)表示为:
Figure PCTCN2022095836-appb-000103
其中,H T表示H的转置;I n表示n维单位矩阵;I k表示k维单位矩阵。
进一步的,所述构建模块中构建的目标函数,表示为:
Figure PCTCN2022095836-appb-000104
Figure PCTCN2022095836-appb-000105
其中,H i表示对第i个运行核k均值聚类得到的基础划分矩阵;λ和β表示调整各项占比的超参数;
Figure PCTCN2022095836-appb-000106
表示为H i的转置;S表示代理图矩阵;I n表示n维单位矩阵。
进一步的,所述求解模块中采用循环方式求解构建的目标函数,具体为:
第一固定模块,用于固定S,优化
Figure PCTCN2022095836-appb-000107
表示为:
Figure PCTCN2022095836-appb-000108
令G=K i-λ(I n-2S+SS T),则公式(7)表示为:
Figure PCTCN2022095836-appb-000109
对G进行特征分解,令H i为其前k个最大特征值对应的特征向量,即可得最优解;
第二固定模块固定
Figure PCTCN2022095836-appb-000110
优化S,表示为:
Figure PCTCN2022095836-appb-000111
求解公式(9):
求解出公式(9)无约束的解,表示为:
Figure PCTCN2022095836-appb-000112
利用导数为0,求得闭式解
Figure PCTCN2022095836-appb-000113
其中
Figure PCTCN2022095836-appb-000114
求距离
Figure PCTCN2022095836-appb-000115
最近的符合约束的解:
Figure PCTCN2022095836-appb-000116
其中,
Figure PCTCN2022095836-appb-000117
表示无约束时代理图矩阵的解。
求得闭式解:
Figure PCTCN2022095836-appb-000118
其中,S j,:表示矩阵S的第j列;α j表示用于求解的中间变量;
Figure PCTCN2022095836-appb-000119
表示
Figure PCTCN2022095836-appb-000120
的第j列;
Figure PCTCN2022095836-appb-000121
表示
Figure PCTCN2022095836-appb-000122
的转置。
进一步的,所述采用循环方式求解构建的目标函数,其中循环终止条件为:
Figure PCTCN2022095836-appb-000123
其中,obj (t-1)、obj (t)分别表示第t和t-1次迭代时目标函数的值;ε表示设定精度。
需要说明的是,本实施例提供的基于代理图改善的后期融合多核聚类机器学习系统与实施例一类似,在此不多做赘述。
本实施例提出的系统包括获取基础划分、构建代理图、利用代理图改善基础划分和利用代理图进行谱聚类等模块。通过对基础划分进行优化,使得经过优化后的基础划分不但拥有单个核的信息,还能通过代理图获得全局信息,更有利于视图的融合,从而使得学习到的代理图能够更好地融合各个核矩阵的信息,达到聚类效果提升的目的。
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。

Claims (10)

  1. 基于代理图改善的后期融合多核聚类机器学习方法,其特征在于,包括步骤:
    S1.获取聚类任务和目标数据样本;
    S2.初始化代理图改善矩阵;
    S3.对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;
    S4.采用循环方式求解步骤S3中构建的目标函数,得到融合基础核信息的图矩阵;
    S5.对得到的图矩阵进行谱聚类,得到最终的聚类结果。
  2. 根据权利要求1所述的基于代理图改善的后期融合多核聚类机器学习方法,其特征在于,所述步骤S3中核k均值聚类的目标函数表示为:
    Figure PCTCN2022095836-appb-100001
    其中,
    Figure PCTCN2022095836-appb-100002
    为由n个样本组成的数据集;B∈{0,1} n×k表示聚类指示矩阵,若第i个样本属于第c个簇,则B ic=1,否则,B ic=0;
    Figure PCTCN2022095836-appb-100003
    表示将样本x投射到一个再生核希尔伯特空间
    Figure PCTCN2022095836-appb-100004
    的特征映射;
    Figure PCTCN2022095836-appb-100005
    n c代表属于第c个簇的样本个数;x i表示数据样本;i表示样本序号;n表示样本点个数;k表示聚类簇的总数;
    令<φ(x i),φ(x j)>=K ij,其中K ij表示核矩阵K的元素,则公式(1)表示为:
    Figure PCTCN2022095836-appb-100006
    其中,K表示核矩阵;
    Figure PCTCN2022095836-appb-100007
    表示属于第k个簇的样本总数的倒数;1 k∈R k表示所有元素都为1的向量;B T表示B的转置;
    Figure PCTCN2022095836-appb-100008
    且H TH=I k,则公式(2)表示为:
    Figure PCTCN2022095836-appb-100009
    其中,H T表示H的转置;I n表示n维单位矩阵;I k表示k维单位矩阵。
  3. 根据权利要求2所述的基于代理图改善的后期融合多核聚类机器学习方法,其特征在于,所述步骤S3中构建的目标函数,表示为:
    Figure PCTCN2022095836-appb-100010
    Figure PCTCN2022095836-appb-100011
    其中,H i表示对第i个运行核k均值聚类得到的基础划分矩阵;λ和β表示调整各项占比的超参数;
    Figure PCTCN2022095836-appb-100012
    表示为H i的转置;S表示代理图矩阵;I n表示n维单位矩阵。
  4. 根据权利要求3所述的基于代理图改善的后期融合多核聚类机器学习方法,其特征在于,所述步骤S4中采用循环方式求解步骤S3中构建的目标函数,具体为:
    S41.固定S,优化
    Figure PCTCN2022095836-appb-100013
    表示为:
    Figure PCTCN2022095836-appb-100014
    令G=K i-λ(I n-2S+SS T),则公式(7)表示为:
    Figure PCTCN2022095836-appb-100015
    对G进行特征分解,令H i为其前k个最大特征值对应的特征向量,即可得最优解;
    S42.固定
    Figure PCTCN2022095836-appb-100016
    优化S,表示为:
    Figure PCTCN2022095836-appb-100017
    通过步骤S421、S422求解公式(9):
    S421.求解出公式(9)无约束的解,表示为:
    Figure PCTCN2022095836-appb-100018
    利用导数为0,求得闭式解
    Figure PCTCN2022095836-appb-100019
    其中
    Figure PCTCN2022095836-appb-100020
    S422.通过公式(11)求距离
    Figure PCTCN2022095836-appb-100021
    最近的符合约束的解:
    Figure PCTCN2022095836-appb-100022
    其中,
    Figure PCTCN2022095836-appb-100023
    表示无约束时代替图矩阵的解;
    求得闭式解:
    Figure PCTCN2022095836-appb-100024
    其中,S j,:表示矩阵S的第j列;α j表示用于求解的中间变量;
    Figure PCTCN2022095836-appb-100025
    表示
    Figure PCTCN2022095836-appb-100026
    的第j列;
    Figure PCTCN2022095836-appb-100027
    表示
    Figure PCTCN2022095836-appb-100028
    的转置。
  5. 根据权利要求4所述的基于代理图改善的后期融合多核聚类机器学习方法,其特征在于,所述采用循环方式求解步骤S3中构建的目标函数,其中循环终止条件为:
    Figure PCTCN2022095836-appb-100029
    其中,obj (t-1)、obj (t)分别表示第t和t-1次迭代时目标函数的值;ε表示设定精度。
  6. 基于代理图改善的后期融合多核聚类机器学习系统,其特征在于,包括:
    获取模块,用于获取聚类任务和目标数据样本;
    初始化模块,用于初始化代理图改善矩阵;
    构建模块,用于对获取聚类任务和目标数据样本相对应的各个视图运行k均值聚类和和图改善,并联合核k均值聚类和图改善的方法构建目标函数;
    求解模块,用于采用循环方式求解构建的目标函数,得到融合基础核信息的图矩阵;
    聚类模块,用于对得到的图矩阵进行谱聚类,得到最终的聚类结果。
  7. 根据权利要求6所述的基于代理图改善的后期融合多核聚类机器学习系统,其特征在于,所述构建模块中核k均值聚类的目标函数表示为:
    Figure PCTCN2022095836-appb-100030
    其中,
    Figure PCTCN2022095836-appb-100031
    为由n个样本组成的数据集;B∈{0,1} n×k表示聚类指示矩阵,若第i个样本属于第c个簇,则B ic=1,否则,B ic=0;
    Figure PCTCN2022095836-appb-100032
    表示将样本x投射到一个再生核希尔伯特空间
    Figure PCTCN2022095836-appb-100033
    的特征映射;
    Figure PCTCN2022095836-appb-100034
    n c代表属于第c个簇的样本个数;x i表示数据样本;i表示样本序号;n表示样本点个数;k表示聚类簇的总数
    令<φ(x i),φ(x j)>=K ij,其中K ij表示核矩阵K的元素,则公式(1)表示为:
    Figure PCTCN2022095836-appb-100035
    其中,K表示核矩阵;
    Figure PCTCN2022095836-appb-100036
    表示属于第k个簇的样本总数的倒数;1 k∈R k表示所有元素都为1的向量;B T表示表示B的转置;
    Figure PCTCN2022095836-appb-100037
    且H TH=I k,则公式(2)表示为:
    Figure PCTCN2022095836-appb-100038
    其中,H T表示H的转置;I n表示n维单位矩阵;I k表示k维单位矩阵。
  8. 根据权利要求7所述的基于代理图改善的后期融合多核聚类机器学习系统,其特征在于,所述构建模块中构建的目标函数,表示为:
    Figure PCTCN2022095836-appb-100039
    Figure PCTCN2022095836-appb-100040
    其中,H i表示对第i个运行核k均值聚类得到的基础划分矩阵;λ和β表示调整各项占比的超参数;
    Figure PCTCN2022095836-appb-100041
    表示为H i的转置;S表示代理图矩阵;I n表示n维单位矩阵。
  9. 根据权利要求8所述的基于代理图改善的后期融合多核聚类机器学习系统,其特征在于,所述求解模块中采用循环方式求解构建的目标函数,具体为:
    第一固定模块,用于固定S,优化
    Figure PCTCN2022095836-appb-100042
    表示为:
    Figure PCTCN2022095836-appb-100043
    令G=K i-λ(I n-2S+SS T),则公式(7)表示为:
    Figure PCTCN2022095836-appb-100044
    对G进行特征分解,令H i为其前k个最大特征值对应的特征向量,即可得最优解;
    第二固定模块固定
    Figure PCTCN2022095836-appb-100045
    优化S,表示为:
    Figure PCTCN2022095836-appb-100046
    求解公式(9):
    求解出公式(9)无约束的解,表示为:
    Figure PCTCN2022095836-appb-100047
    利用导数为0,求得闭式解
    Figure PCTCN2022095836-appb-100048
    其中
    Figure PCTCN2022095836-appb-100049
    求距离
    Figure PCTCN2022095836-appb-100050
    最近的符合约束的解:
    Figure PCTCN2022095836-appb-100051
    其中,
    Figure PCTCN2022095836-appb-100052
    表示无约束时代理图矩阵的解;
    求得闭式解:
    Figure PCTCN2022095836-appb-100053
    其中,S j,:表示矩阵S的第j列;α j表示用于求解的中间变量;
    Figure PCTCN2022095836-appb-100054
    表示
    Figure PCTCN2022095836-appb-100055
    的第j列;
    Figure PCTCN2022095836-appb-100056
    表示
    Figure PCTCN2022095836-appb-100057
    的转置。
  10. 根据权利要求9所述的基于代理图改善的后期融合多核聚类机器学习系统,其特征在于,所述采用循环方式求解构建的目标函数,其中循环终止条件为:
    Figure PCTCN2022095836-appb-100058
    其中,obj (t-1)、obj (t)分别表示第t和t-1次迭代时目标函数的值;ε表示设定精度。
PCT/CN2022/095836 2021-06-01 2022-05-30 基于代理图改善的后期融合多核聚类机器学习方法及系统 WO2022253153A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
ZA2023/11513A ZA202311513B (en) 2021-06-01 2023-12-14 Later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110607669.7A CN113435603A (zh) 2021-06-01 2021-06-01 基于代理图改善的后期融合多核聚类机器学习方法及系统
CN202110607669.7 2021-06-01

Publications (1)

Publication Number Publication Date
WO2022253153A1 true WO2022253153A1 (zh) 2022-12-08

Family

ID=77803408

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095836 WO2022253153A1 (zh) 2021-06-01 2022-05-30 基于代理图改善的后期融合多核聚类机器学习方法及系统

Country Status (3)

Country Link
CN (1) CN113435603A (zh)
WO (1) WO2022253153A1 (zh)
ZA (1) ZA202311513B (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435603A (zh) * 2021-06-01 2021-09-24 浙江师范大学 基于代理图改善的后期融合多核聚类机器学习方法及系统
CN114548262B (zh) * 2022-02-21 2024-03-22 华中科技大学鄂州工业技术研究院 一种情感计算中多模态生理信号的特征级融合方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102021A (zh) * 2018-08-10 2018-12-28 聚时科技(上海)有限公司 缺失条件下的核互补齐多核k-均值聚类机器学习方法
CN109145976A (zh) * 2018-08-14 2019-01-04 聚时科技(上海)有限公司 一种基于最优邻居核的多视图聚类机器学习方法
US20190108444A1 (en) * 2017-10-11 2019-04-11 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for customizing kernel machines with deep neural networks
CN110188812A (zh) * 2019-05-24 2019-08-30 长沙理工大学 一种快速处理缺失异构数据的多核聚类方法
CN113435603A (zh) * 2021-06-01 2021-09-24 浙江师范大学 基于代理图改善的后期融合多核聚类机器学习方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893954B (zh) * 2016-03-30 2019-04-23 深圳大学 一种基于核机器学习的非负矩阵分解人脸识别方法及系统
CN108734187B (zh) * 2017-04-20 2021-09-28 中山大学 一种基于张量奇异值分解的多视图谱聚类算法
CN109063757A (zh) * 2018-07-20 2018-12-21 西安电子科技大学 基于块对角表示和视图多样性的多视图子空间聚类方法
CN109214429B (zh) * 2018-08-14 2021-07-27 聚时科技(上海)有限公司 基于矩阵引导正则化的局部缺失多视图聚类机器学习方法
CN110188825B (zh) * 2019-05-31 2020-01-31 山东师范大学 基于离散多视图聚类的图像聚类方法、系统、设备及介质
CN111898442B (zh) * 2020-06-29 2023-08-11 西北大学 一种基于多模态特征融合的人体动作识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190108444A1 (en) * 2017-10-11 2019-04-11 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for customizing kernel machines with deep neural networks
CN109102021A (zh) * 2018-08-10 2018-12-28 聚时科技(上海)有限公司 缺失条件下的核互补齐多核k-均值聚类机器学习方法
CN109145976A (zh) * 2018-08-14 2019-01-04 聚时科技(上海)有限公司 一种基于最优邻居核的多视图聚类机器学习方法
CN110188812A (zh) * 2019-05-24 2019-08-30 长沙理工大学 一种快速处理缺失异构数据的多核聚类方法
CN113435603A (zh) * 2021-06-01 2021-09-24 浙江师范大学 基于代理图改善的后期融合多核聚类机器学习方法及系统

Also Published As

Publication number Publication date
CN113435603A (zh) 2021-09-24
ZA202311513B (en) 2024-04-24

Similar Documents

Publication Publication Date Title
WO2022253153A1 (zh) 基于代理图改善的后期融合多核聚类机器学习方法及系统
WO2022170840A1 (zh) 基于二部图的后期融合多视图聚类机器学习方法及系统
Wang et al. Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering
Kang et al. Unified spectral clustering with optimal graph
Wang et al. Late fusion multiple kernel clustering with proxy graph refinement
Guo et al. Unsupervised feature selection with ordinal locality
Cai et al. Exact top-k feature selection via l2, 0-norm constraint
Liu et al. Balanced clustering with least square regression
Huang et al. Multiple marginal fisher analysis
Zhao et al. Co-learning non-negative correlated and uncorrelated features for multi-view data
Yi et al. Label propagation based semi-supervised non-negative matrix factorization for feature extraction
Ou et al. Anchor-based multiview subspace clustering with diversity regularization
Choi et al. High performance dimension reduction and visualization for large high-dimensional data analysis
Salehian et al. Recursive estimation of the stein center of SPD matrices and its applications
Ying et al. Enhanced protein fold recognition through a novel data integration approach
WO2022227956A1 (zh) 一种基于局部核的最优邻居多核聚类方法及系统
Le et al. Equivariant graph attention networks for molecular property prediction
CN107358061A (zh) 基于Spark和SIMD的弹性分布式序列比对系统及方法
Wang et al. Joint feature selection and extraction with sparse unsupervised projection
CN109815440B (zh) 联合图优化和投影学习的维数约简方法
Ma et al. Multi-view clustering based on view-attention driven
Zhang et al. Fast local representation learning via adaptive anchor graph for image retrieval
WO2022267955A1 (zh) 基于局部最大对齐的后期融合多视图聚类方法及系统
Bae et al. High performance multidimensional scaling for large high-dimensional data visualization
WO2023020373A1 (zh) 基于局部化简单多核k-均值的人脸图像聚类方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815199

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18566089

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22815199

Country of ref document: EP

Kind code of ref document: A1