WO2022170840A1 - Late fusion multi-view clustering machine learning method and system based on bipartite graph - Google Patents

Late fusion multi-view clustering machine learning method and system based on bipartite graph Download PDF

Info

Publication number
WO2022170840A1
WO2022170840A1 PCT/CN2021/136557 CN2021136557W WO2022170840A1 WO 2022170840 A1 WO2022170840 A1 WO 2022170840A1 CN 2021136557 W CN2021136557 W CN 2021136557W WO 2022170840 A1 WO2022170840 A1 WO 2022170840A1
Authority
WO
WIPO (PCT)
Prior art keywords
view
clustering
bipartite graph
kernel
fusion multi
Prior art date
Application number
PCT/CN2021/136557
Other languages
French (fr)
Chinese (zh)
Inventor
朱信忠
徐慧英
梁伟轩
赵建民
Original Assignee
浙江师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江师范大学 filed Critical 浙江师范大学
Publication of WO2022170840A1 publication Critical patent/WO2022170840A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Abstract

Disclosed is a late fusion multi-view clustering machine learning method based on a bipartite graph. The method comprises: S11, acquiring a clustering task and a target data sample; S12, performing kernel k-means clustering on each view corresponding to the acquired clustering task and target data sample, so as to obtain a basic division, and calculating diversified regular terms of each view; S13, selecting representative points of each view by using random initialization, and establishing a late fusion multi-view clustering target function based on a bipartite graph; S14, circularly solving the established late fusion multi-view clustering target function based on a bipartite graph to obtain a bipartite graph after view fusion is performed; and S15, performing spectral clustering on the obtained bipartite graph to obtain a clustering result. By means of the present application, optimized representative points can represent information of a single view, and can also better serve view fusion, such that a bipartite graph obtained by means of learning can better fuse information of all views, thereby achieving the purpose of improving a clustering effect.

Description

基于二部图的后期融合多视图聚类机器学习方法及系统Machine learning method and system for late fusion multi-view clustering based on bipartite graph 技术领域technical field
本申请涉及计算机视觉和模式识别技术领域,尤其涉及基于二部图的后期融合多视图聚类机器学习方法及系统。The present application relates to the technical fields of computer vision and pattern recognition, and in particular, to a method and system for late fusion multi-view clustering machine learning based on bipartite graphs.
背景技术Background technique
随着信息采集技术的发展,对于同一个数据样本,我们可以轻易得到它不同视图的信息。我们称拥有多个视图信息的数据为多视图数据。为了对多视图数据进行聚类,学术界衍生了多视图聚类算法。With the development of information collection technology, for the same data sample, we can easily obtain information from different views of it. We call data with multiple views of information multi-view data. In order to cluster multi-view data, academia has derived multi-view clustering algorithms.
按照视图融合的时机不同,现有的多视图聚类算法可以大致分为以下两类:(1)基于前期融合的多视图聚类算法。前期融合,是指在进行聚类之前,将多个视图的表征融合起来,得到一个统一的表示。接着,再对其运行聚类算法,得到最终的聚类结果。比较经典的算法有多核聚类算法、多视图谱聚类算法以及多视图子空间聚类算法。(2)基于后期融合的多视图聚类算法。与前期融合不同,后期融合多视图聚类首先从每个单视图中获得基础划分,然后再利用这些基础划分中获得一个最优的聚类结果。所有的集成聚类算法均可以视作一种后期融合方法。例如,利用基础划分先构造各个视图的关联矩阵,即判断样本两两之间是否归为同一类的n×n维的0-1矩阵,通过低秩和稀疏矩阵分解的方式从中学习到一个统一的表示;或者构造各视图关联矩阵后,给定一个样本学习难度的测量准则,利用自步学习按照从简到难的顺序对样本进行聚类;或者,最大化一致划分和基础划分之间的线性组合之间的内积;或者,利用后期融合的方法处理缺失多视图聚类问题。According to the different timing of view fusion, the existing multi-view clustering algorithms can be roughly divided into the following two categories: (1) Multi-view clustering algorithms based on previous fusion. Early fusion refers to the fusion of representations of multiple views to obtain a unified representation before clustering. Then, run the clustering algorithm on it to get the final clustering result. The more classic algorithms are multi-core clustering algorithm, multi-view spectral clustering algorithm and multi-view subspace clustering algorithm. (2) Multi-view clustering algorithm based on late fusion. Different from pre-fusion, post-fusion multi-view clustering first obtains basic divisions from each single view, and then uses these basic divisions to obtain an optimal clustering result. All ensemble clustering algorithms can be regarded as a late fusion method. For example, use the basic division to first construct the correlation matrix of each view, that is, an n×n-dimensional 0-1 matrix that judges whether the samples are classified into the same class, and learns a unified matrix through low-rank and sparse matrix decomposition. or after constructing the correlation matrix of each view, given a measure of the difficulty of sample learning, use self-paced learning to cluster the samples in an order from simple to difficult; or, maximize the linearity between the consistent partition and the basic partition The inner product between combinations; alternatively, use the late fusion method to deal with the missing multi-view clustering problem.
虽然上述算法取得了较好的效果,然而:(1)绝大部分的前期融合多视图聚类算法在空间和时间上的消耗非常大,导致这类算法无法在大规模数据集上得到应用;(2)现有后期融合多视图聚类基于的假设是最大化最优聚类指示矩阵与基础聚类指示矩阵线性组合的内积,用以求取最优聚类指示矩阵,过分简化了最优聚类指示矩阵的搜索空间。Although the above algorithms have achieved good results, however: (1) Most of the early-stage fusion multi-view clustering algorithms consume a lot of space and time, which makes such algorithms unable to be applied to large-scale data sets; (2) The assumption based on the existing late fusion multi-view clustering is to maximize the inner product of the linear combination of the optimal cluster indicator matrix and the basic cluster indicator matrix to obtain the optimal cluster indicator matrix, which oversimplifies the optimal clustering indicator matrix. The optimal cluster indicates the search space of the matrix.
发明内容SUMMARY OF THE INVENTION
本申请的目的是针对现有技术的缺陷,提供了基于二部图的后期融合多视图聚类机器学习方法及系统。The purpose of this application is to provide a bipartite graph-based late fusion multi-view clustering machine learning method and system for the defects of the prior art.
为了实现以上目的,本申请采用以下技术方案:In order to achieve the above purpose, the application adopts the following technical solutions:
基于二部图的后期融合多视图聚类机器学习方法,包括:A late-fusion multi-view clustering machine learning method based on bipartite graphs, including:
S1.获取聚类任务和目标数据样本;S1. Obtain clustering tasks and target data samples;
S2.通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;S2. The basic division is obtained by running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and the diversification regular term of each view is calculated;
S3.利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;S3. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;
S4.采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;S4. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;
S5.对得到的二部图进行谱聚类,得到聚类结果。S5. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
进一步的,所述步骤S2中运行核k均值聚类,具体为:Further, in the step S2, the kernel k-means clustering is performed, specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
Figure PCTCN2021136557-appb-000001
Figure PCTCN2021136557-appb-000001
其中,
Figure PCTCN2021136557-appb-000002
表示由n个样本组成的数据集;
Figure PCTCN2021136557-appb-000003
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2021136557-appb-000004
的特征映射;
Figure PCTCN2021136557-appb-000005
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0。
in,
Figure PCTCN2021136557-appb-000002
represents a dataset consisting of n samples;
Figure PCTCN2021136557-appb-000003
represents the projection of sample x into a regenerated kernel Hilbert space
Figure PCTCN2021136557-appb-000004
feature map of ;
Figure PCTCN2021136557-appb-000005
represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0.
公式(1)化为:Formula (1) is transformed into:
Figure PCTCN2021136557-appb-000006
Figure PCTCN2021136557-appb-000006
其中,K表示核矩阵,K的元素为K ij=φ(x i) Tφ(x j),
Figure PCTCN2021136557-appb-000007
Figure PCTCN2021136557-appb-000008
表示所有元素都为1的向量。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ),
Figure PCTCN2021136557-appb-000007
Figure PCTCN2021136557-appb-000008
Represents a vector with all elements equal to 1.
Figure PCTCN2021136557-appb-000009
并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为:
make
Figure PCTCN2021136557-appb-000009
And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
Figure PCTCN2021136557-appb-000010
Figure PCTCN2021136557-appb-000010
其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
进一步的,所述步骤S3中基于二部图的后期融合多视图聚类目标函数,表示为:Further, the multi-view clustering objective function of later fusion based on the bipartite graph in the step S3 is expressed as:
Figure PCTCN2021136557-appb-000011
Figure PCTCN2021136557-appb-000011
s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
Figure PCTCN2021136557-appb-000012
表示由核k均值聚类得到的各个视图的基础划分;
Figure PCTCN2021136557-appb-000013
表示各个视图的代表点;
Figure PCTCN2021136557-appb-000014
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
Figure PCTCN2021136557-appb-000015
m表示视图数量。
in,
Figure PCTCN2021136557-appb-000012
Represents the basic division of each view obtained by kernel k-means clustering;
Figure PCTCN2021136557-appb-000013
Represents the representative point of each view;
Figure PCTCN2021136557-appb-000014
is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
Figure PCTCN2021136557-appb-000015
m represents the number of views.
进一步的,所述步骤S4中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:Further, in the step S4, the bipartite graph-based late fusion multi-view clustering objective function that is established in a cyclic manner is specifically:
利用三步交替法求解公式(3),具体为:Use the three-step alternation method to solve formula (3), specifically:
A1.固定γ和
Figure PCTCN2021136557-appb-000016
优化Z;
A1. Fixed γ and
Figure PCTCN2021136557-appb-000016
optimize Z;
设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:
Figure PCTCN2021136557-appb-000017
Figure PCTCN2021136557-appb-000017
其中,
Figure PCTCN2021136557-appb-000018
是矩阵
Figure PCTCN2021136557-appb-000019
的第i行;
in,
Figure PCTCN2021136557-appb-000018
is the matrix
Figure PCTCN2021136557-appb-000019
the ith row of ;
A2.固定γ和Z,优化
Figure PCTCN2021136557-appb-000020
采用令目标函数关于A p偏导等于0,得到闭式解
Figure PCTCN2021136557-appb-000021
A2. Fixed γ and Z, optimized
Figure PCTCN2021136557-appb-000020
The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
Figure PCTCN2021136557-appb-000021
A3.固定
Figure PCTCN2021136557-appb-000022
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
A3. Fixed
Figure PCTCN2021136557-appb-000022
and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
Figure PCTCN2021136557-appb-000023
Figure PCTCN2021136557-appb-000023
其中,
Figure PCTCN2021136557-appb-000024
in,
Figure PCTCN2021136557-appb-000024
进一步的,所述步骤S4中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:Further, in the step S4, the three-step alternating method is used to solve the formula (3), wherein the three-step alternating method termination condition is expressed as:
(obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
进一步的,还提供基于二部图的后期融合多视图聚类机器学习系统,包括:Further, it also provides a later fusion multi-view clustering machine learning system based on bipartite graph, including:
获取模块,用于获取聚类任务和目标数据样本;The acquisition module is used to acquire clustering tasks and target data samples;
运行模块,用于通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;The operation module is used to obtain the basic division by running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and calculate the diversification regular term of each view;
建立模块,用于利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;Establishing a module for selecting representative points of each view by random initialization, and establishing a later fusion multi-view clustering objective function based on bipartite graph;
求解模块,用于采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;The solving module is used to solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain the bipartite graph after view fusion;
聚类模块,用于对得到的二部图进行谱聚类,得到聚类结果。The clustering module is used to perform spectral clustering on the obtained bipartite graph to obtain the clustering result.
进一步的,所述运行模块中运行核k均值聚类,具体为:Further, the running kernel k-means clustering in the running module is specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
Figure PCTCN2021136557-appb-000025
Figure PCTCN2021136557-appb-000025
其中,
Figure PCTCN2021136557-appb-000026
表示由n个样本组成的数据集;
Figure PCTCN2021136557-appb-000027
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2021136557-appb-000028
的特征映射;
Figure PCTCN2021136557-appb-000029
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0。公式(1)化为:
in,
Figure PCTCN2021136557-appb-000026
represents a dataset consisting of n samples;
Figure PCTCN2021136557-appb-000027
represents the projection of sample x into a regenerated kernel Hilbert space
Figure PCTCN2021136557-appb-000028
feature map of ;
Figure PCTCN2021136557-appb-000029
represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0. Formula (1) is transformed into:
Figure PCTCN2021136557-appb-000030
Figure PCTCN2021136557-appb-000030
其中,K表示核矩阵,K的元素为K ij=φ(x i) Tφ(x j),
Figure PCTCN2021136557-appb-000031
Figure PCTCN2021136557-appb-000032
表示所有元素都为1的向量。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ),
Figure PCTCN2021136557-appb-000031
Figure PCTCN2021136557-appb-000032
Represents a vector with all elements equal to 1.
Figure PCTCN2021136557-appb-000033
并将离散约束转换为实值正交约束,即H TH=I k,则公式(2) 转换为:
make
Figure PCTCN2021136557-appb-000033
And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
Figure PCTCN2021136557-appb-000034
Figure PCTCN2021136557-appb-000034
其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
进一步的,所述建立模块中基于二部图的后期融合多视图聚类目标函数,表示为:Further, the late fusion multi-view clustering objective function based on the bipartite graph in the establishment module is expressed as:
Figure PCTCN2021136557-appb-000035
Figure PCTCN2021136557-appb-000035
s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
Figure PCTCN2021136557-appb-000036
表示由核k均值聚类得到的各个视图的基础划分;
Figure PCTCN2021136557-appb-000037
表示各个视图的代表点;
Figure PCTCN2021136557-appb-000038
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
Figure PCTCN2021136557-appb-000039
m表示视图数量。
in,
Figure PCTCN2021136557-appb-000036
Represents the basic division of each view obtained by kernel k-means clustering;
Figure PCTCN2021136557-appb-000037
Represents the representative point of each view;
Figure PCTCN2021136557-appb-000038
is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
Figure PCTCN2021136557-appb-000039
m represents the number of views.
进一步的,所述求解模块中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:Further, the bipartite graph-based late-stage fusion multi-view clustering objective function that is established in the solving module using a cyclic method is specifically:
利用三步交替法求解公式(3),具包括:Using the three-step alternating method to solve formula (3), it includes:
第一固定模块,用于固定γ和
Figure PCTCN2021136557-appb-000040
优化Z;
The first fixing module for fixing γ and
Figure PCTCN2021136557-appb-000040
optimize Z;
设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:
Figure PCTCN2021136557-appb-000041
Figure PCTCN2021136557-appb-000041
其中,
Figure PCTCN2021136557-appb-000042
是矩阵
Figure PCTCN2021136557-appb-000043
的第i行;
in,
Figure PCTCN2021136557-appb-000042
is the matrix
Figure PCTCN2021136557-appb-000043
the ith row of ;
第二固定模块,用于固定γ和Z,优化
Figure PCTCN2021136557-appb-000044
采用令目标函数关于A p偏导等于0,得到闭式解
Figure PCTCN2021136557-appb-000045
Second fixation module for fixing γ and Z, optimized
Figure PCTCN2021136557-appb-000044
The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
Figure PCTCN2021136557-appb-000045
第三固定模块,用于固定
Figure PCTCN2021136557-appb-000046
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
Third fixing module for fixing
Figure PCTCN2021136557-appb-000046
and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
Figure PCTCN2021136557-appb-000047
Figure PCTCN2021136557-appb-000047
其中,
Figure PCTCN2021136557-appb-000048
in,
Figure PCTCN2021136557-appb-000048
进一步的,所述求解模块中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:Further, the three-step alternating method is used to solve formula (3) in the solving module, and the termination condition of the three-step alternating method is expressed as:
(obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
与现有技术相比,本申请提出了一种新颖的基于二部图的后期融合多视图聚类机器学习方法,该方法包括获取基础聚类划分和计算图多样化正则项、优化目标函数获取二部图和利用二部图进行聚类等模块。通过对代表点进行优化,本申请使得经过优化后的代表点不但可以代表单个视图的信息,也能更好地服务于视图融合,进而使学习得到的二部图能更好地融合各个视图的信息,达到聚类效果提升的目的。在六个公共数据集上的实验结果证明了本申请的性能优于现有方法。Compared with the prior art, the present application proposes a novel bipartite graph-based late fusion multi-view clustering machine learning method. The method includes acquiring basic clustering divisions and computing graph diversification regular terms, and optimizing objective function acquisition. Modules such as bipartite graph and clustering using bipartite graph. By optimizing the representative points, the present application enables the optimized representative points not only to represent the information of a single view, but also to better serve the view fusion, so that the learned bipartite graph can better fuse the information of each view. information to achieve the purpose of improving the clustering effect. Experimental results on six public datasets demonstrate that the present application outperforms existing methods.
附图说明Description of drawings
图1是实施例一提供的基于二部图的后期融合多视图聚类机器学习方法流程图;1 is a flowchart of a later fusion multi-view clustering machine learning method based on a bipartite graph provided by Embodiment 1;
图2是实施例二提供的参数λ敏感性图示意图;2 is a schematic diagram of a parameter λ sensitivity map provided in Embodiment 2;
图3是实施例二提供的不同代表点数s对聚类效果的影响示意图;3 is a schematic diagram of the influence of different representative points s provided in Embodiment 2 on the clustering effect;
图4是实施例二提供的随迭代次数增加,聚类性能和目标函数值的变化示意图;4 is a schematic diagram of changes in clustering performance and objective function values as the number of iterations increases provided by Embodiment 2;
图5是实施例三提供的基于二部图的后期融合多视图聚类机器学习系统结构图。FIG. 5 is a structural diagram of a later fusion multi-view clustering machine learning system based on a bipartite graph provided in Embodiment 3. FIG.
具体实施方式Detailed ways
以下通过特定的具体实例说明本申请的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通 过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本申请的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The embodiments of the present application are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present application from the contents disclosed in this specification. The application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the application. It should be noted that the following embodiments and features in the embodiments may be combined with each other under the condition of no conflict.
本申请针对现有缺陷,提供了基于二部图的后期融合多视图聚类机器学习方法及系统。Aiming at the existing defects, the present application provides a bipartite graph-based late fusion multi-view clustering machine learning method and system.
实施例一Example 1
本实施例提供的基于二部图的后期融合多视图聚类机器学习方法,如图1所示,包括:The bipartite graph-based late fusion multi-view clustering machine learning method provided in this embodiment, as shown in Figure 1, includes:
S11.获取聚类任务和目标数据样本;S11. Obtain clustering tasks and target data samples;
S12.通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;S12. By running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, the basic division is obtained, and the diversification regular term of each view is calculated;
S13.利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;S13. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;
S14.采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;S14. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner to obtain a bipartite graph after view fusion;
S15.对得到的二部图进行谱聚类,得到聚类结果。S15. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
本实施例提出的一种通过后期融合学习多视图信息进行聚类的新方法,用于表示视图代表点法,相比于在优化过程中不进行更新的锚点,代表点能够更好地服务于多视图聚类;且在后期融合算法中利用二部图进行图学习的方法,降低了计算和存储复杂度。A new method for clustering by learning multi-view information through later fusion proposed in this embodiment is used to represent the view representative point method. Compared with the anchor point that is not updated in the optimization process, the representative point can better serve It is used for multi-view clustering; and the method of using bipartite graph for graph learning in the later fusion algorithm reduces the computational and storage complexity.
在步骤S12中,通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项。具体为:In step S12, the basic division is obtained by running kernel k-means clustering on each view corresponding to the acquired clustering task and the target data sample, and the diversification regular term of each view is calculated. Specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
Figure PCTCN2021136557-appb-000049
Figure PCTCN2021136557-appb-000049
其中,
Figure PCTCN2021136557-appb-000050
表示由n个样本组成的数据集;
Figure PCTCN2021136557-appb-000051
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2021136557-appb-000052
的特征映射;
Figure PCTCN2021136557-appb-000053
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0。
in,
Figure PCTCN2021136557-appb-000050
represents a dataset consisting of n samples;
Figure PCTCN2021136557-appb-000051
represents the projection of sample x into a regenerated kernel Hilbert space
Figure PCTCN2021136557-appb-000052
feature map of ;
Figure PCTCN2021136557-appb-000053
represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0.
公式(1)可以化为:Formula (1) can be transformed into:
Figure PCTCN2021136557-appb-000054
Figure PCTCN2021136557-appb-000054
其中,K表示核矩阵,K的元素为K ij=φ(x i) Tφ(x j),
Figure PCTCN2021136557-appb-000055
Figure PCTCN2021136557-appb-000056
表示所有元素都为1的向量;T是约定俗成的,为矩阵转置,KBL是K、B和L的矩阵相乘。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ),
Figure PCTCN2021136557-appb-000055
Figure PCTCN2021136557-appb-000056
Represents a vector whose elements are all 1s; T is the convention, matrix transpose, and KBL is the matrix multiplication of K, B, and L.
由于上式中的变量B是离散的,优化较为困难。令
Figure PCTCN2021136557-appb-000057
并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为:
Since the variable B in the above formula is discrete, optimization is difficult. make
Figure PCTCN2021136557-appb-000057
And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
Figure PCTCN2021136557-appb-000058
Figure PCTCN2021136557-appb-000058
其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
其闭式解为核矩阵K前k最大特征值对应的特征向量,可通过对K进行特征分解获得。Its closed-form solution is the eigenvector corresponding to the k largest eigenvalues before the kernel matrix K, which can be obtained by eigendecomposition of K.
在步骤S13中,利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数。In step S13, the representative points of each view are selected by random initialization, and a bipartite graph-based late fusion multi-view clustering objective function is established.
其中基于二部图的后期融合多视图聚类目标函数,表示为:Among them, the later fusion multi-view clustering objective function based on bipartite graph is expressed as:
Figure PCTCN2021136557-appb-000059
Figure PCTCN2021136557-appb-000059
s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
Figure PCTCN2021136557-appb-000060
表示由核k均值聚类得到的各个视图的基础划分;
Figure PCTCN2021136557-appb-000061
表示各个视图的代表点;
Figure PCTCN2021136557-appb-000062
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
Figure PCTCN2021136557-appb-000063
m表示视图数量。
in,
Figure PCTCN2021136557-appb-000060
Represents the basic division of each view obtained by kernel k-means clustering;
Figure PCTCN2021136557-appb-000061
Represents the representative point of each view;
Figure PCTCN2021136557-appb-000062
is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
Figure PCTCN2021136557-appb-000063
m represents the number of views.
在步骤S14中,采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图,具体为:In step S14, the established bipartite graph-based late fusion multi-view clustering objective function is solved in a circular manner, and a bipartite graph after view fusion is obtained, specifically:
利用三步交替法求解公式(3),具体为:Use the three-step alternation method to solve formula (3), specifically:
A1.固定γ和
Figure PCTCN2021136557-appb-000064
优化Z;
A1. Fixed γ and
Figure PCTCN2021136557-appb-000064
optimize Z;
设Z的第i行为z i,可以对其逐行优化,即一个在单纯型上的优化问题,则表示为: Assuming that the ith row zi of Z can be optimized row by row, that is, an optimization problem on simplex, it can be expressed as:
Figure PCTCN2021136557-appb-000065
Figure PCTCN2021136557-appb-000065
其中,
Figure PCTCN2021136557-appb-000066
是矩阵
Figure PCTCN2021136557-appb-000067
的第i行;
in,
Figure PCTCN2021136557-appb-000066
is the matrix
Figure PCTCN2021136557-appb-000067
the ith row of ;
A2.固定γ和Z,优化
Figure PCTCN2021136557-appb-000068
可采用令目标函数关于A p偏导等于0,得到闭式解
Figure PCTCN2021136557-appb-000069
A2. Fixed γ and Z, optimized
Figure PCTCN2021136557-appb-000068
The closed-form solution can be obtained by setting the partial derivative of the objective function with respect to A p equal to 0
Figure PCTCN2021136557-appb-000069
A3.固定
Figure PCTCN2021136557-appb-000070
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
A3. Fixed
Figure PCTCN2021136557-appb-000070
and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
Figure PCTCN2021136557-appb-000071
Figure PCTCN2021136557-appb-000071
其中,
Figure PCTCN2021136557-appb-000072
in,
Figure PCTCN2021136557-appb-000072
上述三步交替法终止条件表示为:The termination condition of the above three-step alternation method is expressed as:
(obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
在步骤S15中,对得到的二部图进行谱聚类,得到聚类结果。In step S15, spectral clustering is performed on the obtained bipartite graph to obtain a clustering result.
对二部图Z进行谱聚类的过程具体为:The process of spectral clustering for the bipartite graph Z is as follows:
Figure PCTCN2021136557-appb-000073
其中Λ=diag(Z T1 n)。对
Figure PCTCN2021136557-appb-000074
进行特征值分解,设其前k个最大特征值组成的对角矩阵和对应的特征向量分别为Σ k和V k。令
Figure PCTCN2021136557-appb-000075
按行对F进行标准的k均值聚类即可得到最终的聚类结果。
make
Figure PCTCN2021136557-appb-000073
where Λ=diag(Z T 1 n ). right
Figure PCTCN2021136557-appb-000074
Perform eigenvalue decomposition, and set the diagonal matrix composed of the top k largest eigenvalues and the corresponding eigenvectors to be Σ k and V k respectively . make
Figure PCTCN2021136557-appb-000075
The final clustering result can be obtained by performing standard k-means clustering on F by row.
与现有技术相比,本实施例提出了一种新颖的基于二部图的后期融合多视图聚类机器学习方法,该方法包括获取基础聚类划分和计算图多样化正则项、优化目标函数获取二部图和利用二部图进行聚类等模块。通过对代表点进行优化,本实施例使得经过优化后的代表点不但可以代表单个视图的信息,也能更好地服务于视图融合,进而使学习得到的二部图能更好地融合各个视图的信息,达到聚类效果提升的目的。Compared with the prior art, this embodiment proposes a novel bipartite graph-based late fusion multi-view clustering machine learning method. The method includes acquiring basic clustering division and computing graph diversification regular terms, optimizing the objective function. Modules for obtaining bipartite graphs and clustering using bipartite graphs. By optimizing the representative points, in this embodiment, the optimized representative points can not only represent the information of a single view, but also better serve the view fusion, so that the learned bipartite graph can better fuse each view information to achieve the purpose of improving the clustering effect.
实施例二 Embodiment 2
本实施例提供基于二部图的后期融合多视图聚类机器学习方法与实施例一的不同之处在于:This embodiment provides a bipartite graph-based late fusion multi-view clustering machine learning method and the difference between the first embodiment is:
本实施例在6个MKL标准数据集上测试了本申请方法的聚类性能,包括Oxford Flower17、Oxford Flower102、Protein fold prediction、UCI-Digital、Columbia Consumer Video(CCV)和Caltech102。数据集的相关信息参见表1。In this example, the clustering performance of the proposed method was tested on 6 MKL standard datasets, including Oxford Flower17, Oxford Flower102, Protein fold prediction, UCI-Digital, Columbia Consumer Video (CCV) and Caltech102. See Table 1 for information about the dataset.
表1Table 1
DatasetDataset SamplesSamples KernelsKernels ClustersClusters
Flower17Flower17 13601360 77 1717
Flower102Flower102 81898189 44 102102
ProteinFoldProteinFold 694694 1212 2727
DigitDigit 20002000 33 1010
CCVCCV 67736773 33 2020
Caltech102Caltech102 15301530 2525 102102
对于ProteinFold,本实施例产生了12个基准核矩阵,其中前10特征集使用了二阶多项式核,最后两个使用了cosine内积核。对于CCV,通过应用一个高斯核在SIFT、STIP和MFCC特征上,生成三个基核,三个高斯核的宽度设置成每对样本距离的均值。其他数据集的核矩阵可从互联网下载。For ProteinFold, this example generates 12 benchmark kernel matrices, of which the first 10 feature sets use second-order polynomial kernels, and the last two use cosine inner product kernels. For CCV, three base kernels are generated by applying a Gaussian kernel on the SIFT, STIP and MFCC features, and the width of the three Gaussian kernels is set as the mean of the distances of each pair of samples. Kernel matrices for other datasets can be downloaded from the Internet.
本实验采用平均多核聚类算法(A-MKKM)、最优单视图核k均值聚类算法(SB-MKKM)、多核k均值聚类(MKKM)、鲁棒的多核聚类(RMKKM)、带矩阵诱导正则化项的多核k均值聚类(MKKM-MR)、最优邻居多核聚类(ONKC)、基于后期融合的最大化对齐多视图聚类(MVC-LFA)。在所有实验中,所有基准核首先被中心化和正则化。对于所有数据集,假设类别数量已知且被设置为聚类类别数量。另外,本实验使用了网格搜索RMKKM、MKKM-MR、ONKC和MVC-LFA的参数。本实施例方法的正则化参数也通过网格搜索[2 -15,2 -12,…,2 15]的范围来确定,代表点数取s=8k,k为聚类簇数。 In this experiment, the average multi-kernel clustering algorithm (A-MKKM), the optimal single-view kernel k-means clustering algorithm (SB-MKKM), the multi-kernel k-means clustering (MKKM), the robust multi-kernel clustering (RMKKM), the Multi-kernel k-means clustering with matrix-induced regularization term (MKKM-MR), optimal neighbor multi-kernel clustering (ONKC), late fusion-based maximally aligned multi-view clustering (MVC-LFA). In all experiments, all benchmark kernels are first centered and regularized. For all datasets, the number of classes is assumed to be known and set to the number of cluster classes. In addition, this experiment uses the grid search parameters of RMKKM, MKKM-MR, ONKC and MVC-LFA. The regularization parameter of the method in this embodiment is also determined by grid searching in the range of [ 2-15,2-12 ,..., 2 15 ] , the number of representative points is s=8k, and k is the number of clusters.
本实验使用了常见的聚类准确度(ACC)、归一化互信息(NMI)和纯度(Purity)来显示每种方法的聚类性能。所有方法随机初始化并重复50次并显示最佳结果以减少k-means造成的随机性。This experiment uses Common Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Purity (Purity) to show the clustering performance of each method. All methods are randomly initialized and repeated 50 times and show the best results to reduce randomness caused by k-means.
表2Table 2
Figure PCTCN2021136557-appb-000076
Figure PCTCN2021136557-appb-000076
Figure PCTCN2021136557-appb-000077
Figure PCTCN2021136557-appb-000077
表2展示了上述方法以及对比算法在所有数据集上的聚类效果。根据该表可以观察到:1.所提出的算法在三种评价标准下,均优于所有对比算法。2.ONKC在多核算法中是一种重要的基准算法,而所提出的算法在六个数据集ACC上的表现要分别优于ONKC达7.14%,10.22%,3.17%,3.45%,6.07%和10.2%。3.MVC-LFA是一种后期融合算法,通常表现要比其他绝大部分多视图算法要好,而所提出的算法在三个聚类指标下,分别平均超出其7.58%,7.07%和7.34%。Table 2 shows the clustering effects of the above methods and the comparison algorithms on all datasets. According to the table, it can be observed that: 1. The proposed algorithm outperforms all comparison algorithms under the three evaluation criteria. 2. ONKC is an important benchmark algorithm in multi-core algorithms, and the performance of the proposed algorithm on the six datasets ACC is 7.14%, 10.22%, 3.17%, 3.45%, 6.07% better than ONKC, respectively and 10.2%. 3.MVC-LFA is a late fusion algorithm, which usually performs better than most other multi-view algorithms, and the proposed algorithm exceeds its average by 7.58%, 7.07% and 7.34% under the three clustering indicators, respectively. .
此外,我们还对比了在优化过程中不进行更新的锚点的表现,即分别用k均值聚类和随机采样选定锚点,代入目标式,在算法运行过程中不进行更新。为了避免算法随机性的影响,我们重复了该实验50次,取所有结果的平均值。结果如表3所示。In addition, we also compared the performance of anchor points that were not updated during the optimization process, that is, using k-means clustering and random sampling to select anchor points, substitute them into the target formula, and not update them during the running of the algorithm. To avoid the influence of randomness of the algorithm, we repeated the experiment 50 times and averaged all the results. The results are shown in Table 3.
表3table 3
Figure PCTCN2021136557-appb-000078
Figure PCTCN2021136557-appb-000078
Figure PCTCN2021136557-appb-000079
Figure PCTCN2021136557-appb-000079
从表3可以看出,无论是通过k均值选定或者是随机选定的代表点的效果,都比我们提出的代表点法要差很多。因此,我们代表点在算法优化过程中的更新是有效的。It can be seen from Table 3 that the effect of representative points selected by k-means or randomly selected is much worse than that of the representative point method proposed by us. Therefore, the update of our representative points during the algorithm optimization process is effective.
本实施例引入了正则化参数λ以平衡二部图学习和多样化正则项的比重。如图2所示,绘出了当λ在[2 -15,2 -12,…,2 15]范围内变化时NMI的变化,以在该数据集上效果最好的对比算法作为基本参照。从该图可以看出:1)最佳NMI总是在适当地平衡两项时得到;2)所提出的算法在大部分数据集上无论λ如何变化,效果均优于最好的对比算法。 This embodiment introduces a regularization parameter λ to balance the weight of bipartite graph learning and diversification of regular terms. As shown in Fig. 2 , the variation of NMI is plotted when λ varies in the range of [ 2-15,2-12 ,..., 215 ], taking the best comparison algorithm on this dataset as the basic reference. From this figure, it can be seen that: 1) the best NMI is always obtained when the two terms are properly balanced; 2) the proposed algorithm outperforms the best contrasting algorithm on most datasets regardless of the variation of λ.
本实施例还有一个重要的参数,即代表点的个数s。我们在[2k,4k,...,14k]范围内选取代表点个数,其中k为聚类簇数,并进行实验,结果如图3所示。可以看出随着s的增大,聚类效果呈总体上升的趋势。但是较大的s必将带来较高的计算开销,为了兼顾聚类效果和复杂度,可以经验地选择代表点数s=8k。There is another important parameter in this embodiment, that is, the number s of representative points. We select the number of representative points in the range of [2k,4k,...,14k], where k is the number of clusters, and conduct experiments. The results are shown in Figure 3. It can be seen that with the increase of s, the clustering effect shows an overall upward trend. However, a larger s will inevitably bring higher computational overhead. In order to take into account the clustering effect and complexity, the number of representative points s=8k can be selected empirically.
本实施例也给出了每次迭代时的目标函数值和聚类表现的变化,如图4所示。可以看出目标函数值单调减少且通常在25次迭代之内即可收敛。可以看出,随着目标函数的减少,聚类效果会有所波动,但整体呈现上升趋势,本实例说明算法在训练过程中,能够不断提高聚类性能。This embodiment also gives the objective function value and changes in clustering performance at each iteration, as shown in FIG. 4 . It can be seen that the objective function value decreases monotonically and usually converges within 25 iterations. It can be seen that with the decrease of the objective function, the clustering effect will fluctuate, but the overall trend is upward. This example shows that the algorithm can continuously improve the clustering performance during the training process.
实施例三 Embodiment 3
本实施例提供基于二部图的后期融合多视图聚类机器学习系统,如图5所示,包括:This embodiment provides a later fusion multi-view clustering machine learning system based on bipartite graph, as shown in Figure 5, including:
获取模块11,用于获取聚类任务和目标数据样本;an acquisition module 11, for acquiring clustering tasks and target data samples;
运行模块12,用于通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;The operation module 12 is used for running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the basic division, and calculate the diversification regular term of each view;
建立模块13,用于利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;The establishment module 13 is used to select the representative points of each view by random initialization, and establish the later fusion multi-view clustering objective function based on the bipartite graph;
求解模块14,用于采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;The solving module 14 is used to solve the established bipartite graph-based later fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;
聚类模块15,用于对得到的二部图进行谱聚类,得到聚类结果。The clustering module 15 is configured to perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
进一步的,所述运行模块中运行核k均值聚类,具体为:Further, the running kernel k-means clustering in the running module is specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
Figure PCTCN2021136557-appb-000080
Figure PCTCN2021136557-appb-000080
其中,
Figure PCTCN2021136557-appb-000081
表示由n个样本组成的数据集;
Figure PCTCN2021136557-appb-000082
表示将样本x投射到一个再生核希尔伯特空间
Figure PCTCN2021136557-appb-000083
的特征映射;
Figure PCTCN2021136557-appb-000084
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0。公式(1)化为:
in,
Figure PCTCN2021136557-appb-000081
represents a dataset consisting of n samples;
Figure PCTCN2021136557-appb-000082
represents the projection of sample x into a regenerated kernel Hilbert space
Figure PCTCN2021136557-appb-000083
feature map of ;
Figure PCTCN2021136557-appb-000084
represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0. Formula (1) is transformed into:
Figure PCTCN2021136557-appb-000085
Figure PCTCN2021136557-appb-000085
其中,K表示核矩阵,K的元素为K ij=φ(x i) Tφ(x j),
Figure PCTCN2021136557-appb-000086
Figure PCTCN2021136557-appb-000087
表示所有元素都为1的向量。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ),
Figure PCTCN2021136557-appb-000086
Figure PCTCN2021136557-appb-000087
Represents a vector with all elements equal to 1.
Figure PCTCN2021136557-appb-000088
并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为:
make
Figure PCTCN2021136557-appb-000088
And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
Figure PCTCN2021136557-appb-000089
Figure PCTCN2021136557-appb-000089
其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
进一步的,所述建立模块中基于二部图的后期融合多视图聚类目标函数,表示为:Further, the late fusion multi-view clustering objective function based on the bipartite graph in the establishment module is expressed as:
Figure PCTCN2021136557-appb-000090
Figure PCTCN2021136557-appb-000090
s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
Figure PCTCN2021136557-appb-000091
表示由核k均值聚类得到的各个视图的基础划分;
Figure PCTCN2021136557-appb-000092
表示各个视图的代表点;
Figure PCTCN2021136557-appb-000093
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
Figure PCTCN2021136557-appb-000094
m表示视图数量。
in,
Figure PCTCN2021136557-appb-000091
Represents the basic division of each view obtained by kernel k-means clustering;
Figure PCTCN2021136557-appb-000092
Represents the representative point of each view;
Figure PCTCN2021136557-appb-000093
is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
Figure PCTCN2021136557-appb-000094
m represents the number of views.
进一步的,所述求解模块中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:Further, the bipartite graph-based late-stage fusion multi-view clustering objective function that is established in the solving module using a cyclic method is specifically:
利用三步交替法求解公式(3),具包括:Using the three-step alternating method to solve formula (3), it includes:
第一固定模块,用于固定γ和
Figure PCTCN2021136557-appb-000095
优化Z;
The first fixing module for fixing γ and
Figure PCTCN2021136557-appb-000095
optimize Z;
设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:
Figure PCTCN2021136557-appb-000096
Figure PCTCN2021136557-appb-000096
其中,
Figure PCTCN2021136557-appb-000097
是矩阵
Figure PCTCN2021136557-appb-000098
的第i行;
in,
Figure PCTCN2021136557-appb-000097
is the matrix
Figure PCTCN2021136557-appb-000098
the ith row of ;
第二固定模块,用于固定γ和Z,优化
Figure PCTCN2021136557-appb-000099
采用令目标函数关于A p偏导等于0,得到闭式解
Figure PCTCN2021136557-appb-000100
Second fixation module for fixing γ and Z, optimized
Figure PCTCN2021136557-appb-000099
The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
Figure PCTCN2021136557-appb-000100
第三固定模块,用于固定
Figure PCTCN2021136557-appb-000101
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
Third fixing module for fixing
Figure PCTCN2021136557-appb-000101
and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
Figure PCTCN2021136557-appb-000102
Figure PCTCN2021136557-appb-000102
其中,
Figure PCTCN2021136557-appb-000103
in,
Figure PCTCN2021136557-appb-000103
进一步的,所述求解模块中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:Further, the three-step alternating method is used to solve formula (3) in the solving module, and the termination condition of the three-step alternating method is expressed as:
(obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
需要说明的是,本实施例提供的基于二部图的后期融合多视图聚类机器学 习系统与实施例一类似,在此不多做赘述。It should be noted that the bipartite graph-based late fusion multi-view clustering machine learning system provided in this embodiment is similar to that of the first embodiment, and details are not repeated here.
与现有技术相比,本实施例包括获取基础聚类划分和计算图多样化正则项、优化目标函数获取二部图和利用二部图进行聚类等模块。通过对代表点进行优化,本实施例使得经过优化后的代表点不但可以代表单个视图的信息,也能更好地服务于视图融合,进而使学习得到的二部图能更好地融合各个视图的信息,达到聚类效果提升的目的。Compared with the prior art, this embodiment includes modules such as acquiring basic clustering division and computing graph diversification regular terms, optimizing objective function to acquire bipartite graph, and using bipartite graph for clustering. By optimizing the representative points, in this embodiment, the optimized representative points can not only represent the information of a single view, but also better serve the view fusion, so that the learned bipartite graph can better fuse each view information to achieve the purpose of improving the clustering effect.
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims (10)

  1. 基于二部图的后期融合多视图聚类机器学习方法,其特征在于,包括:The bipartite graph-based late fusion multi-view clustering machine learning method is characterized in that it includes:
    S1.获取聚类任务和目标数据样本;S1. Obtain clustering tasks and target data samples;
    S2.通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;S2. The basic division is obtained by running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and the diversification regular term of each view is calculated;
    S3.利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;S3. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;
    S4.采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;S4. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;
    S5.对得到的二部图进行谱聚类,得到聚类结果。S5. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
  2. 根据权利要求1所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S2中运行核k均值聚类,具体为:The bipartite graph-based late-stage fusion multi-view clustering machine learning method according to claim 1, wherein in the step S2, the kernel k-means clustering is performed, specifically:
    核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
    Figure PCTCN2021136557-appb-100001
    Figure PCTCN2021136557-appb-100001
    其中,
    Figure PCTCN2021136557-appb-100002
    表示由n个样本组成的数据集;
    Figure PCTCN2021136557-appb-100003
    表示将样本x投射到一个再生核希尔伯特空间
    Figure PCTCN2021136557-appb-100004
    的特征映射;
    Figure PCTCN2021136557-appb-100005
    表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0;
    in,
    Figure PCTCN2021136557-appb-100002
    represents a dataset consisting of n samples;
    Figure PCTCN2021136557-appb-100003
    represents the projection of sample x into a regenerated kernel Hilbert space
    Figure PCTCN2021136557-appb-100004
    feature map of ;
    Figure PCTCN2021136557-appb-100005
    Indicates the number of samples belonging to the c-th cluster, 1≤c≤k; i indicates the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0;
    公式(1)化为:Formula (1) is transformed into:
    Figure PCTCN2021136557-appb-100006
    Figure PCTCN2021136557-appb-100006
    其中,K表示核矩阵,K的元素为
    Figure PCTCN2021136557-appb-100007
    Figure PCTCN2021136557-appb-100008
    表示所有元素都为1的向量;
    Among them, K represents the kernel matrix, and the elements of K are
    Figure PCTCN2021136557-appb-100007
    Figure PCTCN2021136557-appb-100008
    represents a vector whose elements are all 1;
    Figure PCTCN2021136557-appb-100009
    并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为:
    make
    Figure PCTCN2021136557-appb-100009
    And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
    Figure PCTCN2021136557-appb-100010
    Figure PCTCN2021136557-appb-100010
    其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
  3. 根据权利要求2所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S3中基于二部图的后期融合多视图聚类目标函数,表示为:The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 2, wherein in the step S3, the bipartite graph-based late fusion multi-view clustering objective function is expressed as:
    Figure PCTCN2021136557-appb-100011
    Figure PCTCN2021136557-appb-100011
    s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
    其中,
    Figure PCTCN2021136557-appb-100012
    表示由核k均值聚类得到的各个视图的基础划分;
    Figure PCTCN2021136557-appb-100013
    表示各个视图的代表点;
    Figure PCTCN2021136557-appb-100014
    为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
    Figure PCTCN2021136557-appb-100015
    m表示视图数量。
    in,
    Figure PCTCN2021136557-appb-100012
    Represents the basic division of each view obtained by kernel k-means clustering;
    Figure PCTCN2021136557-appb-100013
    Represents the representative point of each view;
    Figure PCTCN2021136557-appb-100014
    is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
    Figure PCTCN2021136557-appb-100015
    m represents the number of views.
  4. 根据权利要求3所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S4中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 3, wherein in the step S4, the bipartite graph-based late fusion multi-view clustering objective function is solved and established in a circular manner Specifically:
    利用三步交替法求解公式(3),具体为:Use the three-step alternation method to solve formula (3), specifically:
    A1.固定γ和
    Figure PCTCN2021136557-appb-100016
    优化Z;
    A1. Fixed γ and
    Figure PCTCN2021136557-appb-100016
    optimize Z;
    设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:
    Figure PCTCN2021136557-appb-100017
    Figure PCTCN2021136557-appb-100017
    其中,
    Figure PCTCN2021136557-appb-100018
    是矩阵
    Figure PCTCN2021136557-appb-100019
    的第i行;
    in,
    Figure PCTCN2021136557-appb-100018
    is the matrix
    Figure PCTCN2021136557-appb-100019
    the ith row of ;
    A2.固定γ和Z,优化
    Figure PCTCN2021136557-appb-100020
    采用令目标函数关于A p偏导等于0,得到闭式解
    Figure PCTCN2021136557-appb-100021
    A2. Fixed γ and Z, optimized
    Figure PCTCN2021136557-appb-100020
    The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
    Figure PCTCN2021136557-appb-100021
    A3.固定
    Figure PCTCN2021136557-appb-100022
    和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
    A3. Fixed
    Figure PCTCN2021136557-appb-100022
    and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
    Figure PCTCN2021136557-appb-100023
    Figure PCTCN2021136557-appb-100023
    其中,
    Figure PCTCN2021136557-appb-100024
    in,
    Figure PCTCN2021136557-appb-100024
  5. 根据权利要求4所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S4中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 4, characterized in that in step S4, a three-step alternation method is used to solve formula (3), wherein the termination condition of the three-step alternation method represents the for:
    (obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε
    其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
  6. 基于二部图的后期融合多视图聚类机器学习系统,其特征在于,包括:The later fusion multi-view clustering machine learning system based on bipartite graph is characterized in that it includes:
    获取模块,用于获取聚类任务和目标数据样本;The acquisition module is used to acquire clustering tasks and target data samples;
    运行模块,用于通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;The operation module is used to obtain the basic division by running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and calculate the diversification regular term of each view;
    建立模块,用于利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;establishing a module for selecting representative points of each view by random initialization, and establishing a later fusion multi-view clustering objective function based on bipartite graph;
    求解模块,用于采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;The solving module is used to solve the established bipartite graph-based later fusion multi-view clustering objective function in a cyclic manner, and obtain the bipartite graph after view fusion;
    聚类模块,用于对得到的二部图进行谱聚类,得到聚类结果。The clustering module is used to perform spectral clustering on the obtained bipartite graph to obtain the clustering result.
  7. 根据权利要求6所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述运行模块中运行核k均值聚类,具体为:The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 6, characterized in that, in the operation module, the kernel k-means clustering is performed, specifically:
    核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
    Figure PCTCN2021136557-appb-100025
    Figure PCTCN2021136557-appb-100025
    其中,
    Figure PCTCN2021136557-appb-100026
    表示由n个样本组成的数据集;
    Figure PCTCN2021136557-appb-100027
    表示将样本x投射到一个再生核希尔伯特空间
    Figure PCTCN2021136557-appb-100028
    的特征映射;
    Figure PCTCN2021136557-appb-100029
    表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0;
    in,
    Figure PCTCN2021136557-appb-100026
    represents a dataset consisting of n samples;
    Figure PCTCN2021136557-appb-100027
    represents the projection of sample x into a regenerated kernel Hilbert space
    Figure PCTCN2021136557-appb-100028
    feature map of ;
    Figure PCTCN2021136557-appb-100029
    Indicates the number of samples belonging to the c-th cluster, 1≤c≤k; i indicates the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0;
    公式(1)化为:Formula (1) is transformed into:
    Figure PCTCN2021136557-appb-100030
    Figure PCTCN2021136557-appb-100030
    其中,K表示核矩阵,K的元素为K ij=φ(x i) Tφ(x j),
    Figure PCTCN2021136557-appb-100031
    Figure PCTCN2021136557-appb-100032
    表示所有元素都为1的向量;
    Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ),
    Figure PCTCN2021136557-appb-100031
    Figure PCTCN2021136557-appb-100032
    represents a vector whose elements are all 1;
    Figure PCTCN2021136557-appb-100033
    并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为:
    make
    Figure PCTCN2021136557-appb-100033
    And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
    Figure PCTCN2021136557-appb-100034
    Figure PCTCN2021136557-appb-100034
    其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
  8. 根据权利要求7所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述建立模块中基于二部图的后期融合多视图聚类目标函数,表示为:The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 7, wherein the bipartite graph-based late fusion multi-view clustering objective function in the establishment module is expressed as:
    Figure PCTCN2021136557-appb-100035
    Figure PCTCN2021136557-appb-100035
    s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
    其中,
    Figure PCTCN2021136557-appb-100036
    表示由核k均值聚类得到的各个视图的基础划分;
    Figure PCTCN2021136557-appb-100037
    表示各个视图的代表点;
    Figure PCTCN2021136557-appb-100038
    为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
    Figure PCTCN2021136557-appb-100039
    m表示视图数量。
    in,
    Figure PCTCN2021136557-appb-100036
    Represents the basic division of each view obtained by kernel k-means clustering;
    Figure PCTCN2021136557-appb-100037
    Represents the representative point of each view;
    Figure PCTCN2021136557-appb-100038
    is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for
    Figure PCTCN2021136557-appb-100039
    m represents the number of views.
  9. 根据权利要求8所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述求解模块中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 8, wherein the solving module adopts a circular way to solve the established bipartite graph-based late fusion multi-view clustering objective function Specifically:
    利用三步交替法求解公式(3),具包括:Using the three-step alternating method to solve formula (3), it includes:
    第一固定模块,用于固定γ和
    Figure PCTCN2021136557-appb-100040
    优化Z;
    The first fixing module for fixing γ and
    Figure PCTCN2021136557-appb-100040
    optimize Z;
    设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:
    Figure PCTCN2021136557-appb-100041
    Figure PCTCN2021136557-appb-100041
    其中,
    Figure PCTCN2021136557-appb-100042
    是矩阵
    Figure PCTCN2021136557-appb-100043
    的第i行;
    in,
    Figure PCTCN2021136557-appb-100042
    is the matrix
    Figure PCTCN2021136557-appb-100043
    the ith row of ;
    第二固定模块,用于固定γ和Z,优化
    Figure PCTCN2021136557-appb-100044
    采用令目标函数关于A p偏导等于0,得到闭式解
    Figure PCTCN2021136557-appb-100045
    Second fixation module for fixing γ and Z, optimized
    Figure PCTCN2021136557-appb-100044
    The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
    Figure PCTCN2021136557-appb-100045
    第三固定模块,用于固定
    Figure PCTCN2021136557-appb-100046
    和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
    Third fixing module for fixing
    Figure PCTCN2021136557-appb-100046
    and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
    Figure PCTCN2021136557-appb-100047
    Figure PCTCN2021136557-appb-100047
    其中,
    Figure PCTCN2021136557-appb-100048
    in,
    Figure PCTCN2021136557-appb-100048
  10. 根据权利要求-所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述求解模块中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:The bipartite graph-based late-stage fusion multi-view clustering machine learning system according to claim, characterized in that, in the solution module, a three-step alternation method is used to solve formula (3), wherein the three-step alternation method termination condition represents for:
    (obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε
    其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
PCT/CN2021/136557 2021-02-09 2021-12-08 Late fusion multi-view clustering machine learning method and system based on bipartite graph WO2022170840A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110173493.9A CN112990265A (en) 2021-02-09 2021-02-09 Post-fusion multi-view clustering machine learning method and system based on bipartite graph
CN202110173493.9 2021-02-09

Publications (1)

Publication Number Publication Date
WO2022170840A1 true WO2022170840A1 (en) 2022-08-18

Family

ID=76347689

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136557 WO2022170840A1 (en) 2021-02-09 2021-12-08 Late fusion multi-view clustering machine learning method and system based on bipartite graph

Country Status (4)

Country Link
CN (1) CN112990265A (en)
LU (1) LU502853B1 (en)
WO (1) WO2022170840A1 (en)
ZA (1) ZA202207736B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292162A (en) * 2023-11-27 2023-12-26 烟台大学 Target tracking method, system, equipment and medium for multi-view image clustering

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990265A (en) * 2021-02-09 2021-06-18 浙江师范大学 Post-fusion multi-view clustering machine learning method and system based on bipartite graph
CN113627462A (en) * 2021-06-24 2021-11-09 浙江师范大学 Medical data clustering method and system based on matrix decomposition and multi-partition alignment
CN113627237A (en) * 2021-06-24 2021-11-09 浙江师范大学 Late-stage fusion face image clustering method and system based on local maximum alignment
CN113610103A (en) * 2021-06-24 2021-11-05 浙江师范大学 Medical data clustering method and system based on unified anchor point and subspace learning
CN113837218A (en) * 2021-08-17 2021-12-24 浙江师范大学 Text clustering method and system based on one-step post-fusion multi-view
CN116152269A (en) * 2021-11-19 2023-05-23 华为技术有限公司 Bipartite graph construction method, bipartite graph display method and bipartite graph construction device
CN117009838B (en) * 2023-09-27 2024-01-26 江西师范大学 Multi-scale fusion contrast learning multi-view clustering method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132224A (en) * 2020-09-28 2020-12-25 广东工业大学 Rapid spectrum embedding clustering method based on graph learning
US20210019325A1 (en) * 2019-07-15 2021-01-21 Microsoft Technology Licensing, Llc Graph embedding already-collected but not yet connected data
CN112287974A (en) * 2020-09-28 2021-01-29 北京工业大学 Multi-view K multi-mean image clustering method based on self-adaptive weight
CN112990265A (en) * 2021-02-09 2021-06-18 浙江师范大学 Post-fusion multi-view clustering machine learning method and system based on bipartite graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210019325A1 (en) * 2019-07-15 2021-01-21 Microsoft Technology Licensing, Llc Graph embedding already-collected but not yet connected data
CN112132224A (en) * 2020-09-28 2020-12-25 广东工业大学 Rapid spectrum embedding clustering method based on graph learning
CN112287974A (en) * 2020-09-28 2021-01-29 北京工业大学 Multi-view K multi-mean image clustering method based on self-adaptive weight
CN112990265A (en) * 2021-02-09 2021-06-18 浙江师范大学 Post-fusion multi-view clustering machine learning method and system based on bipartite graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"M.S. Dissertation", 1 November 2016, TIANJIN UNIVERSITY, CN, article JINCI YI: "Research and Application of Clustering Algorithms for Large Scale Data Sets", pages: 1 - 61, XP055958991 *
LIU XINWANG; ZHU XINZHONG; LI MIAOMIAO; WANG LEI; ZHU EN; LIU TONGLIANG; KLOFT MARIUS; SHEN DINGGANG; YIN JIANPING; GAO WEN: "Multiple Kernel kk-Means with Incomplete Kernels", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY., USA, vol. 42, no. 5, 11 January 2019 (2019-01-11), USA , pages 1191 - 1204, XP011780949, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2019.2892416 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292162A (en) * 2023-11-27 2023-12-26 烟台大学 Target tracking method, system, equipment and medium for multi-view image clustering
CN117292162B (en) * 2023-11-27 2024-03-08 烟台大学 Target tracking method, system, equipment and medium for multi-view image clustering

Also Published As

Publication number Publication date
ZA202207736B (en) 2022-07-27
CN112990265A (en) 2021-06-18
LU502853B1 (en) 2023-01-30

Similar Documents

Publication Publication Date Title
WO2022170840A1 (en) Late fusion multi-view clustering machine learning method and system based on bipartite graph
Xue et al. Deep low-rank subspace ensemble for multi-view clustering
Feng et al. Adaptive unsupervised multi-view feature selection for visual concept recognition
Erisoglu et al. A new algorithm for initial cluster centers in k-means algorithm
Huang et al. Multiple marginal fisher analysis
Zhang et al. Simplifying mixture models through function approximation
WO2019015246A1 (en) Image feature acquisition
CN105608478B (en) image feature extraction and classification combined method and system
WO2022253153A1 (en) Later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement
An et al. Using the relevance vector machine model combined with local phase quantization to predict protein-protein interactions from protein sequences
Lock et al. Supervised multiway factorization
Hofmann et al. Efficient approximations of robust soft learning vector quantization for non-vectorial data
WO2022227956A1 (en) Optimal neighbor multi-kernel clustering method and system based on local kernel
Wang et al. Multi-manifold clustering
Lu et al. Dimension reduction of multimodal data by auto-weighted local discriminant analysis
CN106845462A (en) The face identification method of feature and cluster is selected while induction based on triple
Wang et al. Local tangent space alignment via nuclear norm regularization for incomplete data
Dornaika et al. Single phase multi-view clustering using unified graph learning and spectral representation
Feng et al. Automatic instance selection via locality constrained sparse representation for missing value estimation
Lafaye de Micheaux et al. Pls for big data: a unified parallel algorithm for regularised group pls
WO2022267955A1 (en) Post-fusion multi-view clustering method and system based on local maximum alignment
Pan et al. Revised contrastive loss for robust age estimation from face
CN112800138B (en) Big data classification method and system
Livi et al. Dissimilarity space embedding of labeled graphs by a clustering-based compression procedure
Boukouvalas Development of ICA and IVA algorithms with application to medical image analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21925487

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21925487

Country of ref document: EP

Kind code of ref document: A1