WO2022170840A1 - Late fusion multi-view clustering machine learning method and system based on bipartite graph - Google Patents
Late fusion multi-view clustering machine learning method and system based on bipartite graph Download PDFInfo
- Publication number
- WO2022170840A1 WO2022170840A1 PCT/CN2021/136557 CN2021136557W WO2022170840A1 WO 2022170840 A1 WO2022170840 A1 WO 2022170840A1 CN 2021136557 W CN2021136557 W CN 2021136557W WO 2022170840 A1 WO2022170840 A1 WO 2022170840A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- view
- clustering
- bipartite graph
- kernel
- fusion multi
- Prior art date
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 81
- 238000010801 machine learning Methods 0.000 title claims abstract description 25
- 230000006870 function Effects 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000003064 k means clustering Methods 0.000 claims abstract description 30
- 230000003595 spectral effect Effects 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 39
- 125000004122 cyclic group Chemical group 0.000 claims description 9
- 238000005192 partition Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 abstract description 11
- 238000002474 experimental method Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 101100049727 Arabidopsis thaliana WOX9 gene Proteins 0.000 description 1
- 101150059016 TFIP11 gene Proteins 0.000 description 1
- 102100032856 Tuftelin-interacting protein 11 Human genes 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2323—Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Abstract
Disclosed is a late fusion multi-view clustering machine learning method based on a bipartite graph. The method comprises: S11, acquiring a clustering task and a target data sample; S12, performing kernel k-means clustering on each view corresponding to the acquired clustering task and target data sample, so as to obtain a basic division, and calculating diversified regular terms of each view; S13, selecting representative points of each view by using random initialization, and establishing a late fusion multi-view clustering target function based on a bipartite graph; S14, circularly solving the established late fusion multi-view clustering target function based on a bipartite graph to obtain a bipartite graph after view fusion is performed; and S15, performing spectral clustering on the obtained bipartite graph to obtain a clustering result. By means of the present application, optimized representative points can represent information of a single view, and can also better serve view fusion, such that a bipartite graph obtained by means of learning can better fuse information of all views, thereby achieving the purpose of improving a clustering effect.
Description
本申请涉及计算机视觉和模式识别技术领域,尤其涉及基于二部图的后期融合多视图聚类机器学习方法及系统。The present application relates to the technical fields of computer vision and pattern recognition, and in particular, to a method and system for late fusion multi-view clustering machine learning based on bipartite graphs.
随着信息采集技术的发展,对于同一个数据样本,我们可以轻易得到它不同视图的信息。我们称拥有多个视图信息的数据为多视图数据。为了对多视图数据进行聚类,学术界衍生了多视图聚类算法。With the development of information collection technology, for the same data sample, we can easily obtain information from different views of it. We call data with multiple views of information multi-view data. In order to cluster multi-view data, academia has derived multi-view clustering algorithms.
按照视图融合的时机不同,现有的多视图聚类算法可以大致分为以下两类:(1)基于前期融合的多视图聚类算法。前期融合,是指在进行聚类之前,将多个视图的表征融合起来,得到一个统一的表示。接着,再对其运行聚类算法,得到最终的聚类结果。比较经典的算法有多核聚类算法、多视图谱聚类算法以及多视图子空间聚类算法。(2)基于后期融合的多视图聚类算法。与前期融合不同,后期融合多视图聚类首先从每个单视图中获得基础划分,然后再利用这些基础划分中获得一个最优的聚类结果。所有的集成聚类算法均可以视作一种后期融合方法。例如,利用基础划分先构造各个视图的关联矩阵,即判断样本两两之间是否归为同一类的n×n维的0-1矩阵,通过低秩和稀疏矩阵分解的方式从中学习到一个统一的表示;或者构造各视图关联矩阵后,给定一个样本学习难度的测量准则,利用自步学习按照从简到难的顺序对样本进行聚类;或者,最大化一致划分和基础划分之间的线性组合之间的内积;或者,利用后期融合的方法处理缺失多视图聚类问题。According to the different timing of view fusion, the existing multi-view clustering algorithms can be roughly divided into the following two categories: (1) Multi-view clustering algorithms based on previous fusion. Early fusion refers to the fusion of representations of multiple views to obtain a unified representation before clustering. Then, run the clustering algorithm on it to get the final clustering result. The more classic algorithms are multi-core clustering algorithm, multi-view spectral clustering algorithm and multi-view subspace clustering algorithm. (2) Multi-view clustering algorithm based on late fusion. Different from pre-fusion, post-fusion multi-view clustering first obtains basic divisions from each single view, and then uses these basic divisions to obtain an optimal clustering result. All ensemble clustering algorithms can be regarded as a late fusion method. For example, use the basic division to first construct the correlation matrix of each view, that is, an n×n-dimensional 0-1 matrix that judges whether the samples are classified into the same class, and learns a unified matrix through low-rank and sparse matrix decomposition. or after constructing the correlation matrix of each view, given a measure of the difficulty of sample learning, use self-paced learning to cluster the samples in an order from simple to difficult; or, maximize the linearity between the consistent partition and the basic partition The inner product between combinations; alternatively, use the late fusion method to deal with the missing multi-view clustering problem.
虽然上述算法取得了较好的效果,然而:(1)绝大部分的前期融合多视图聚类算法在空间和时间上的消耗非常大,导致这类算法无法在大规模数据集上得到应用;(2)现有后期融合多视图聚类基于的假设是最大化最优聚类指示矩阵与基础聚类指示矩阵线性组合的内积,用以求取最优聚类指示矩阵,过分简化了最优聚类指示矩阵的搜索空间。Although the above algorithms have achieved good results, however: (1) Most of the early-stage fusion multi-view clustering algorithms consume a lot of space and time, which makes such algorithms unable to be applied to large-scale data sets; (2) The assumption based on the existing late fusion multi-view clustering is to maximize the inner product of the linear combination of the optimal cluster indicator matrix and the basic cluster indicator matrix to obtain the optimal cluster indicator matrix, which oversimplifies the optimal clustering indicator matrix. The optimal cluster indicates the search space of the matrix.
发明内容SUMMARY OF THE INVENTION
本申请的目的是针对现有技术的缺陷,提供了基于二部图的后期融合多视图聚类机器学习方法及系统。The purpose of this application is to provide a bipartite graph-based late fusion multi-view clustering machine learning method and system for the defects of the prior art.
为了实现以上目的,本申请采用以下技术方案:In order to achieve the above purpose, the application adopts the following technical solutions:
基于二部图的后期融合多视图聚类机器学习方法,包括:A late-fusion multi-view clustering machine learning method based on bipartite graphs, including:
S1.获取聚类任务和目标数据样本;S1. Obtain clustering tasks and target data samples;
S2.通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;S2. The basic division is obtained by running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and the diversification regular term of each view is calculated;
S3.利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;S3. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;
S4.采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;S4. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;
S5.对得到的二部图进行谱聚类,得到聚类结果。S5. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
进一步的,所述步骤S2中运行核k均值聚类,具体为:Further, in the step S2, the kernel k-means clustering is performed, specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1}
n×k的平方误差和,表示为:
The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
其中,
表示由n个样本组成的数据集;
表示将样本x投射到一个再生核希尔伯特空间
的特征映射;
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B
ic=1,否则,B
ic=0。
in, represents a dataset consisting of n samples; represents the projection of sample x into a regenerated kernel Hilbert space feature map of ; represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0.
公式(1)化为:Formula (1) is transformed into:
其中,K表示核矩阵,K的元素为K
ij=φ(x
i)
Tφ(x
j),
表示所有元素都为1的向量。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ), Represents a vector with all elements equal to 1.
令
并将离散约束转换为实值正交约束,即H
TH=I
k,则公式(2)转换为:
make And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
其中,I
k表示k维单位矩阵。
Among them, I k represents the k-dimensional identity matrix.
进一步的,所述步骤S3中基于二部图的后期融合多视图聚类目标函数,表示为:Further, the multi-view clustering objective function of later fusion based on the bipartite graph in the step S3 is expressed as:
s.t.Z1
s=1
n,Z≥0,γ
T1
m=1,γ≥0
stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
表示由核k均值聚类得到的各个视图的基础划分;
表示各个视图的代表点;
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
m表示视图数量。
in, Represents the basic division of each view obtained by kernel k-means clustering; Represents the representative point of each view; is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for m represents the number of views.
进一步的,所述步骤S4中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:Further, in the step S4, the bipartite graph-based late fusion multi-view clustering objective function that is established in a cyclic manner is specifically:
利用三步交替法求解公式(3),具体为:Use the three-step alternation method to solve formula (3), specifically:
设Z的第i行为z
i,则表示为:
Assuming the i-th row of Z i , it is expressed as:
A2.固定γ和Z,优化
采用令目标函数关于A
p偏导等于0,得到闭式解
A2. Fixed γ and Z, optimized The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
A3.固定
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
A3. Fixed and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
进一步的,所述步骤S4中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:Further, in the step S4, the three-step alternating method is used to solve the formula (3), wherein the three-step alternating method termination condition is expressed as:
(obj
(t-1)-obj
(t))/obj
(t)≤ε
(obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj
(t-1)、obj
(t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。
Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
进一步的,还提供基于二部图的后期融合多视图聚类机器学习系统,包括:Further, it also provides a later fusion multi-view clustering machine learning system based on bipartite graph, including:
获取模块,用于获取聚类任务和目标数据样本;The acquisition module is used to acquire clustering tasks and target data samples;
运行模块,用于通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;The operation module is used to obtain the basic division by running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and calculate the diversification regular term of each view;
建立模块,用于利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;Establishing a module for selecting representative points of each view by random initialization, and establishing a later fusion multi-view clustering objective function based on bipartite graph;
求解模块,用于采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;The solving module is used to solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain the bipartite graph after view fusion;
聚类模块,用于对得到的二部图进行谱聚类,得到聚类结果。The clustering module is used to perform spectral clustering on the obtained bipartite graph to obtain the clustering result.
进一步的,所述运行模块中运行核k均值聚类,具体为:Further, the running kernel k-means clustering in the running module is specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1}
n×k的平方误差和,表示为:
The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
其中,
表示由n个样本组成的数据集;
表示将样本x投射到一个再生核希尔伯特空间
的特征映射;
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B
ic=1,否则,B
ic=0。公式(1)化为:
in, represents a dataset consisting of n samples; represents the projection of sample x into a regenerated kernel Hilbert space feature map of ; represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0. Formula (1) is transformed into:
其中,K表示核矩阵,K的元素为K
ij=φ(x
i)
Tφ(x
j),
表示所有元素都为1的向量。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ), Represents a vector with all elements equal to 1.
令
并将离散约束转换为实值正交约束,即H
TH=I
k,则公式(2) 转换为:
make And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
其中,I
k表示k维单位矩阵。
Among them, I k represents the k-dimensional identity matrix.
进一步的,所述建立模块中基于二部图的后期融合多视图聚类目标函数,表示为:Further, the late fusion multi-view clustering objective function based on the bipartite graph in the establishment module is expressed as:
s.t.Z1
s=1
n,Z≥0,γ
T1
m=1,γ≥0
stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
表示由核k均值聚类得到的各个视图的基础划分;
表示各个视图的代表点;
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
m表示视图数量。
in, Represents the basic division of each view obtained by kernel k-means clustering; Represents the representative point of each view; is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for m represents the number of views.
进一步的,所述求解模块中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:Further, the bipartite graph-based late-stage fusion multi-view clustering objective function that is established in the solving module using a cyclic method is specifically:
利用三步交替法求解公式(3),具包括:Using the three-step alternating method to solve formula (3), it includes:
设Z的第i行为z
i,则表示为:
Assuming the i-th row of Z i , it is expressed as:
第二固定模块,用于固定γ和Z,优化
采用令目标函数关于A
p偏导等于0,得到闭式解
Second fixation module for fixing γ and Z, optimized The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
第三固定模块,用于固定
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
Third fixing module for fixing and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
进一步的,所述求解模块中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:Further, the three-step alternating method is used to solve formula (3) in the solving module, and the termination condition of the three-step alternating method is expressed as:
(obj
(t-1)-obj
(t))/obj
(t)≤ε
(obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj
(t-1)、obj
(t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。
Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
与现有技术相比,本申请提出了一种新颖的基于二部图的后期融合多视图聚类机器学习方法,该方法包括获取基础聚类划分和计算图多样化正则项、优化目标函数获取二部图和利用二部图进行聚类等模块。通过对代表点进行优化,本申请使得经过优化后的代表点不但可以代表单个视图的信息,也能更好地服务于视图融合,进而使学习得到的二部图能更好地融合各个视图的信息,达到聚类效果提升的目的。在六个公共数据集上的实验结果证明了本申请的性能优于现有方法。Compared with the prior art, the present application proposes a novel bipartite graph-based late fusion multi-view clustering machine learning method. The method includes acquiring basic clustering divisions and computing graph diversification regular terms, and optimizing objective function acquisition. Modules such as bipartite graph and clustering using bipartite graph. By optimizing the representative points, the present application enables the optimized representative points not only to represent the information of a single view, but also to better serve the view fusion, so that the learned bipartite graph can better fuse the information of each view. information to achieve the purpose of improving the clustering effect. Experimental results on six public datasets demonstrate that the present application outperforms existing methods.
图1是实施例一提供的基于二部图的后期融合多视图聚类机器学习方法流程图;1 is a flowchart of a later fusion multi-view clustering machine learning method based on a bipartite graph provided by Embodiment 1;
图2是实施例二提供的参数λ敏感性图示意图;2 is a schematic diagram of a parameter λ sensitivity map provided in Embodiment 2;
图3是实施例二提供的不同代表点数s对聚类效果的影响示意图;3 is a schematic diagram of the influence of different representative points s provided in Embodiment 2 on the clustering effect;
图4是实施例二提供的随迭代次数增加,聚类性能和目标函数值的变化示意图;4 is a schematic diagram of changes in clustering performance and objective function values as the number of iterations increases provided by Embodiment 2;
图5是实施例三提供的基于二部图的后期融合多视图聚类机器学习系统结构图。FIG. 5 is a structural diagram of a later fusion multi-view clustering machine learning system based on a bipartite graph provided in Embodiment 3. FIG.
以下通过特定的具体实例说明本申请的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通 过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本申请的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The embodiments of the present application are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present application from the contents disclosed in this specification. The application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the application. It should be noted that the following embodiments and features in the embodiments may be combined with each other under the condition of no conflict.
本申请针对现有缺陷,提供了基于二部图的后期融合多视图聚类机器学习方法及系统。Aiming at the existing defects, the present application provides a bipartite graph-based late fusion multi-view clustering machine learning method and system.
实施例一Example 1
本实施例提供的基于二部图的后期融合多视图聚类机器学习方法,如图1所示,包括:The bipartite graph-based late fusion multi-view clustering machine learning method provided in this embodiment, as shown in Figure 1, includes:
S11.获取聚类任务和目标数据样本;S11. Obtain clustering tasks and target data samples;
S12.通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;S12. By running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, the basic division is obtained, and the diversification regular term of each view is calculated;
S13.利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;S13. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;
S14.采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;S14. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner to obtain a bipartite graph after view fusion;
S15.对得到的二部图进行谱聚类,得到聚类结果。S15. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
本实施例提出的一种通过后期融合学习多视图信息进行聚类的新方法,用于表示视图代表点法,相比于在优化过程中不进行更新的锚点,代表点能够更好地服务于多视图聚类;且在后期融合算法中利用二部图进行图学习的方法,降低了计算和存储复杂度。A new method for clustering by learning multi-view information through later fusion proposed in this embodiment is used to represent the view representative point method. Compared with the anchor point that is not updated in the optimization process, the representative point can better serve It is used for multi-view clustering; and the method of using bipartite graph for graph learning in the later fusion algorithm reduces the computational and storage complexity.
在步骤S12中,通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项。具体为:In step S12, the basic division is obtained by running kernel k-means clustering on each view corresponding to the acquired clustering task and the target data sample, and the diversification regular term of each view is calculated. Specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1}
n×k的平方误差和,表示为:
The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
其中,
表示由n个样本组成的数据集;
表示将样本x投射到一个再生核希尔伯特空间
的特征映射;
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B
ic=1,否则,B
ic=0。
in, represents a dataset consisting of n samples; represents the projection of sample x into a regenerated kernel Hilbert space feature map of ; represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0.
公式(1)可以化为:Formula (1) can be transformed into:
其中,K表示核矩阵,K的元素为K
ij=φ(x
i)
Tφ(x
j),
表示所有元素都为1的向量;T是约定俗成的,为矩阵转置,KBL是K、B和L的矩阵相乘。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ), Represents a vector whose elements are all 1s; T is the convention, matrix transpose, and KBL is the matrix multiplication of K, B, and L.
由于上式中的变量B是离散的,优化较为困难。令
并将离散约束转换为实值正交约束,即H
TH=I
k,则公式(2)转换为:
Since the variable B in the above formula is discrete, optimization is difficult. make And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
其中,I
k表示k维单位矩阵。
Among them, I k represents the k-dimensional identity matrix.
其闭式解为核矩阵K前k最大特征值对应的特征向量,可通过对K进行特征分解获得。Its closed-form solution is the eigenvector corresponding to the k largest eigenvalues before the kernel matrix K, which can be obtained by eigendecomposition of K.
在步骤S13中,利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数。In step S13, the representative points of each view are selected by random initialization, and a bipartite graph-based late fusion multi-view clustering objective function is established.
其中基于二部图的后期融合多视图聚类目标函数,表示为:Among them, the later fusion multi-view clustering objective function based on bipartite graph is expressed as:
s.t.Z1
s=1
n,Z≥0,γ
T1
m=1,γ≥0
stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
表示由核k均值聚类得到的各个视图的基础划分;
表示各个视图的代表点;
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
m表示视图数量。
in, Represents the basic division of each view obtained by kernel k-means clustering; Represents the representative point of each view; is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for m represents the number of views.
在步骤S14中,采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图,具体为:In step S14, the established bipartite graph-based late fusion multi-view clustering objective function is solved in a circular manner, and a bipartite graph after view fusion is obtained, specifically:
利用三步交替法求解公式(3),具体为:Use the three-step alternation method to solve formula (3), specifically:
设Z的第i行为z
i,可以对其逐行优化,即一个在单纯型上的优化问题,则表示为:
Assuming that the ith row zi of Z can be optimized row by row, that is, an optimization problem on simplex, it can be expressed as:
A2.固定γ和Z,优化
可采用令目标函数关于A
p偏导等于0,得到闭式解
A2. Fixed γ and Z, optimized The closed-form solution can be obtained by setting the partial derivative of the objective function with respect to A p equal to 0
A3.固定
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
A3. Fixed and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
上述三步交替法终止条件表示为:The termination condition of the above three-step alternation method is expressed as:
(obj
(t-1)-obj
(t))/obj
(t)≤ε
(obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj
(t-1)、obj
(t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。
Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
在步骤S15中,对得到的二部图进行谱聚类,得到聚类结果。In step S15, spectral clustering is performed on the obtained bipartite graph to obtain a clustering result.
对二部图Z进行谱聚类的过程具体为:The process of spectral clustering for the bipartite graph Z is as follows:
令
其中Λ=diag(Z
T1
n)。对
进行特征值分解,设其前k个最大特征值组成的对角矩阵和对应的特征向量分别为Σ
k和V
k。令
按行对F进行标准的k均值聚类即可得到最终的聚类结果。
make where Λ=diag(Z T 1 n ). right Perform eigenvalue decomposition, and set the diagonal matrix composed of the top k largest eigenvalues and the corresponding eigenvectors to be Σ k and V k respectively . make The final clustering result can be obtained by performing standard k-means clustering on F by row.
与现有技术相比,本实施例提出了一种新颖的基于二部图的后期融合多视图聚类机器学习方法,该方法包括获取基础聚类划分和计算图多样化正则项、优化目标函数获取二部图和利用二部图进行聚类等模块。通过对代表点进行优化,本实施例使得经过优化后的代表点不但可以代表单个视图的信息,也能更好地服务于视图融合,进而使学习得到的二部图能更好地融合各个视图的信息,达到聚类效果提升的目的。Compared with the prior art, this embodiment proposes a novel bipartite graph-based late fusion multi-view clustering machine learning method. The method includes acquiring basic clustering division and computing graph diversification regular terms, optimizing the objective function. Modules for obtaining bipartite graphs and clustering using bipartite graphs. By optimizing the representative points, in this embodiment, the optimized representative points can not only represent the information of a single view, but also better serve the view fusion, so that the learned bipartite graph can better fuse each view information to achieve the purpose of improving the clustering effect.
实施例二 Embodiment 2
本实施例提供基于二部图的后期融合多视图聚类机器学习方法与实施例一的不同之处在于:This embodiment provides a bipartite graph-based late fusion multi-view clustering machine learning method and the difference between the first embodiment is:
本实施例在6个MKL标准数据集上测试了本申请方法的聚类性能,包括Oxford Flower17、Oxford Flower102、Protein fold prediction、UCI-Digital、Columbia Consumer Video(CCV)和Caltech102。数据集的相关信息参见表1。In this example, the clustering performance of the proposed method was tested on 6 MKL standard datasets, including Oxford Flower17, Oxford Flower102, Protein fold prediction, UCI-Digital, Columbia Consumer Video (CCV) and Caltech102. See Table 1 for information about the dataset.
表1Table 1
DatasetDataset | SamplesSamples | KernelsKernels | ClustersClusters |
Flower17Flower17 | 13601360 | 77 | 1717 |
Flower102Flower102 | 81898189 | 44 | 102102 |
ProteinFoldProteinFold | 694694 | 1212 | 2727 |
DigitDigit | 20002000 | 33 | 1010 |
CCVCCV | 67736773 | 33 | 2020 |
Caltech102Caltech102 | 15301530 | 2525 | 102102 |
对于ProteinFold,本实施例产生了12个基准核矩阵,其中前10特征集使用了二阶多项式核,最后两个使用了cosine内积核。对于CCV,通过应用一个高斯核在SIFT、STIP和MFCC特征上,生成三个基核,三个高斯核的宽度设置成每对样本距离的均值。其他数据集的核矩阵可从互联网下载。For ProteinFold, this example generates 12 benchmark kernel matrices, of which the first 10 feature sets use second-order polynomial kernels, and the last two use cosine inner product kernels. For CCV, three base kernels are generated by applying a Gaussian kernel on the SIFT, STIP and MFCC features, and the width of the three Gaussian kernels is set as the mean of the distances of each pair of samples. Kernel matrices for other datasets can be downloaded from the Internet.
本实验采用平均多核聚类算法(A-MKKM)、最优单视图核k均值聚类算法(SB-MKKM)、多核k均值聚类(MKKM)、鲁棒的多核聚类(RMKKM)、带矩阵诱导正则化项的多核k均值聚类(MKKM-MR)、最优邻居多核聚类(ONKC)、基于后期融合的最大化对齐多视图聚类(MVC-LFA)。在所有实验中,所有基准核首先被中心化和正则化。对于所有数据集,假设类别数量已知且被设置为聚类类别数量。另外,本实验使用了网格搜索RMKKM、MKKM-MR、ONKC和MVC-LFA的参数。本实施例方法的正则化参数也通过网格搜索[2
-15,2
-12,…,2
15]的范围来确定,代表点数取s=8k,k为聚类簇数。
In this experiment, the average multi-kernel clustering algorithm (A-MKKM), the optimal single-view kernel k-means clustering algorithm (SB-MKKM), the multi-kernel k-means clustering (MKKM), the robust multi-kernel clustering (RMKKM), the Multi-kernel k-means clustering with matrix-induced regularization term (MKKM-MR), optimal neighbor multi-kernel clustering (ONKC), late fusion-based maximally aligned multi-view clustering (MVC-LFA). In all experiments, all benchmark kernels are first centered and regularized. For all datasets, the number of classes is assumed to be known and set to the number of cluster classes. In addition, this experiment uses the grid search parameters of RMKKM, MKKM-MR, ONKC and MVC-LFA. The regularization parameter of the method in this embodiment is also determined by grid searching in the range of [ 2-15,2-12 ,..., 2 15 ] , the number of representative points is s=8k, and k is the number of clusters.
本实验使用了常见的聚类准确度(ACC)、归一化互信息(NMI)和纯度(Purity)来显示每种方法的聚类性能。所有方法随机初始化并重复50次并显示最佳结果以减少k-means造成的随机性。This experiment uses Common Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Purity (Purity) to show the clustering performance of each method. All methods are randomly initialized and repeated 50 times and show the best results to reduce randomness caused by k-means.
表2Table 2
表2展示了上述方法以及对比算法在所有数据集上的聚类效果。根据该表可以观察到:1.所提出的算法在三种评价标准下,均优于所有对比算法。2.ONKC在多核算法中是一种重要的基准算法,而所提出的算法在六个数据集ACC上的表现要分别优于ONKC达7.14%,10.22%,3.17%,3.45%,6.07%和10.2%。3.MVC-LFA是一种后期融合算法,通常表现要比其他绝大部分多视图算法要好,而所提出的算法在三个聚类指标下,分别平均超出其7.58%,7.07%和7.34%。Table 2 shows the clustering effects of the above methods and the comparison algorithms on all datasets. According to the table, it can be observed that: 1. The proposed algorithm outperforms all comparison algorithms under the three evaluation criteria. 2. ONKC is an important benchmark algorithm in multi-core algorithms, and the performance of the proposed algorithm on the six datasets ACC is 7.14%, 10.22%, 3.17%, 3.45%, 6.07% better than ONKC, respectively and 10.2%. 3.MVC-LFA is a late fusion algorithm, which usually performs better than most other multi-view algorithms, and the proposed algorithm exceeds its average by 7.58%, 7.07% and 7.34% under the three clustering indicators, respectively. .
此外,我们还对比了在优化过程中不进行更新的锚点的表现,即分别用k均值聚类和随机采样选定锚点,代入目标式,在算法运行过程中不进行更新。为了避免算法随机性的影响,我们重复了该实验50次,取所有结果的平均值。结果如表3所示。In addition, we also compared the performance of anchor points that were not updated during the optimization process, that is, using k-means clustering and random sampling to select anchor points, substitute them into the target formula, and not update them during the running of the algorithm. To avoid the influence of randomness of the algorithm, we repeated the experiment 50 times and averaged all the results. The results are shown in Table 3.
表3table 3
从表3可以看出,无论是通过k均值选定或者是随机选定的代表点的效果,都比我们提出的代表点法要差很多。因此,我们代表点在算法优化过程中的更新是有效的。It can be seen from Table 3 that the effect of representative points selected by k-means or randomly selected is much worse than that of the representative point method proposed by us. Therefore, the update of our representative points during the algorithm optimization process is effective.
本实施例引入了正则化参数λ以平衡二部图学习和多样化正则项的比重。如图2所示,绘出了当λ在[2
-15,2
-12,…,2
15]范围内变化时NMI的变化,以在该数据集上效果最好的对比算法作为基本参照。从该图可以看出:1)最佳NMI总是在适当地平衡两项时得到;2)所提出的算法在大部分数据集上无论λ如何变化,效果均优于最好的对比算法。
This embodiment introduces a regularization parameter λ to balance the weight of bipartite graph learning and diversification of regular terms. As shown in Fig. 2 , the variation of NMI is plotted when λ varies in the range of [ 2-15,2-12 ,..., 215 ], taking the best comparison algorithm on this dataset as the basic reference. From this figure, it can be seen that: 1) the best NMI is always obtained when the two terms are properly balanced; 2) the proposed algorithm outperforms the best contrasting algorithm on most datasets regardless of the variation of λ.
本实施例还有一个重要的参数,即代表点的个数s。我们在[2k,4k,...,14k]范围内选取代表点个数,其中k为聚类簇数,并进行实验,结果如图3所示。可以看出随着s的增大,聚类效果呈总体上升的趋势。但是较大的s必将带来较高的计算开销,为了兼顾聚类效果和复杂度,可以经验地选择代表点数s=8k。There is another important parameter in this embodiment, that is, the number s of representative points. We select the number of representative points in the range of [2k,4k,...,14k], where k is the number of clusters, and conduct experiments. The results are shown in Figure 3. It can be seen that with the increase of s, the clustering effect shows an overall upward trend. However, a larger s will inevitably bring higher computational overhead. In order to take into account the clustering effect and complexity, the number of representative points s=8k can be selected empirically.
本实施例也给出了每次迭代时的目标函数值和聚类表现的变化,如图4所示。可以看出目标函数值单调减少且通常在25次迭代之内即可收敛。可以看出,随着目标函数的减少,聚类效果会有所波动,但整体呈现上升趋势,本实例说明算法在训练过程中,能够不断提高聚类性能。This embodiment also gives the objective function value and changes in clustering performance at each iteration, as shown in FIG. 4 . It can be seen that the objective function value decreases monotonically and usually converges within 25 iterations. It can be seen that with the decrease of the objective function, the clustering effect will fluctuate, but the overall trend is upward. This example shows that the algorithm can continuously improve the clustering performance during the training process.
实施例三 Embodiment 3
本实施例提供基于二部图的后期融合多视图聚类机器学习系统,如图5所示,包括:This embodiment provides a later fusion multi-view clustering machine learning system based on bipartite graph, as shown in Figure 5, including:
获取模块11,用于获取聚类任务和目标数据样本;an acquisition module 11, for acquiring clustering tasks and target data samples;
运行模块12,用于通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;The operation module 12 is used for running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample to obtain the basic division, and calculate the diversification regular term of each view;
建立模块13,用于利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;The establishment module 13 is used to select the representative points of each view by random initialization, and establish the later fusion multi-view clustering objective function based on the bipartite graph;
求解模块14,用于采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;The solving module 14 is used to solve the established bipartite graph-based later fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;
聚类模块15,用于对得到的二部图进行谱聚类,得到聚类结果。The clustering module 15 is configured to perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
进一步的,所述运行模块中运行核k均值聚类,具体为:Further, the running kernel k-means clustering in the running module is specifically:
核k均值聚类的目标为最小化基于划分矩阵B∈{0,1}
n×k的平方误差和,表示为:
The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:
其中,
表示由n个样本组成的数据集;
表示将样本x投射到一个再生核希尔伯特空间
的特征映射;
表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B
ic=1,否则,B
ic=0。公式(1)化为:
in, represents a dataset consisting of n samples; represents the projection of sample x into a regenerated kernel Hilbert space feature map of ; represents the number of samples belonging to the c-th cluster, 1≤c≤k; i represents the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0. Formula (1) is transformed into:
其中,K表示核矩阵,K的元素为K
ij=φ(x
i)
Tφ(x
j),
表示所有元素都为1的向量。
Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ), Represents a vector with all elements equal to 1.
令
并将离散约束转换为实值正交约束,即H
TH=I
k,则公式(2)转换为:
make And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:
其中,I
k表示k维单位矩阵。
Among them, I k represents the k-dimensional identity matrix.
进一步的,所述建立模块中基于二部图的后期融合多视图聚类目标函数,表示为:Further, the late fusion multi-view clustering objective function based on the bipartite graph in the establishment module is expressed as:
s.t.Z1
s=1
n,Z≥0,γ
T1
m=1,γ≥0
stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0
其中,
表示由核k均值聚类得到的各个视图的基础划分;
表示各个视图的代表点;
为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为
m表示视图数量。
in, Represents the basic division of each view obtained by kernel k-means clustering; Represents the representative point of each view; is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for m represents the number of views.
进一步的,所述求解模块中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:Further, the bipartite graph-based late-stage fusion multi-view clustering objective function that is established in the solving module using a cyclic method is specifically:
利用三步交替法求解公式(3),具包括:Using the three-step alternating method to solve formula (3), it includes:
设Z的第i行为z
i,则表示为:
Assuming the i-th row of Z i , it is expressed as:
第二固定模块,用于固定γ和Z,优化
采用令目标函数关于A
p偏导等于0,得到闭式解
Second fixation module for fixing γ and Z, optimized The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0
第三固定模块,用于固定
和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为:
Third fixing module for fixing and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
进一步的,所述求解模块中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:Further, the three-step alternating method is used to solve formula (3) in the solving module, and the termination condition of the three-step alternating method is expressed as:
(obj
(t-1)-obj
(t))/obj
(t)≤ε
(obj (t-1) -obj (t) )/obj (t) ≤ε
其中,obj
(t-1)、obj
(t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。
Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
需要说明的是,本实施例提供的基于二部图的后期融合多视图聚类机器学 习系统与实施例一类似,在此不多做赘述。It should be noted that the bipartite graph-based late fusion multi-view clustering machine learning system provided in this embodiment is similar to that of the first embodiment, and details are not repeated here.
与现有技术相比,本实施例包括获取基础聚类划分和计算图多样化正则项、优化目标函数获取二部图和利用二部图进行聚类等模块。通过对代表点进行优化,本实施例使得经过优化后的代表点不但可以代表单个视图的信息,也能更好地服务于视图融合,进而使学习得到的二部图能更好地融合各个视图的信息,达到聚类效果提升的目的。Compared with the prior art, this embodiment includes modules such as acquiring basic clustering division and computing graph diversification regular terms, optimizing objective function to acquire bipartite graph, and using bipartite graph for clustering. By optimizing the representative points, in this embodiment, the optimized representative points can not only represent the information of a single view, but also better serve the view fusion, so that the learned bipartite graph can better fuse each view information to achieve the purpose of improving the clustering effect.
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.
Claims (10)
- 基于二部图的后期融合多视图聚类机器学习方法,其特征在于,包括:The bipartite graph-based late fusion multi-view clustering machine learning method is characterized in that it includes:S1.获取聚类任务和目标数据样本;S1. Obtain clustering tasks and target data samples;S2.通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;S2. The basic division is obtained by running kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and the diversification regular term of each view is calculated;S3.利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;S3. Use random initialization to select representative points of each view, and establish a later fusion multi-view clustering objective function based on bipartite graph;S4.采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;S4. Solve the established bipartite graph-based late fusion multi-view clustering objective function in a cyclic manner, and obtain a bipartite graph after view fusion;S5.对得到的二部图进行谱聚类,得到聚类结果。S5. Perform spectral clustering on the obtained bipartite graph to obtain a clustering result.
- 根据权利要求1所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S2中运行核k均值聚类,具体为:The bipartite graph-based late-stage fusion multi-view clustering machine learning method according to claim 1, wherein in the step S2, the kernel k-means clustering is performed, specifically:核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:其中, 表示由n个样本组成的数据集; 表示将样本x投射到一个再生核希尔伯特空间 的特征映射; 表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0; in, represents a dataset consisting of n samples; represents the projection of sample x into a regenerated kernel Hilbert space feature map of ; Indicates the number of samples belonging to the c-th cluster, 1≤c≤k; i indicates the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0;公式(1)化为:Formula (1) is transformed into:其中,K表示核矩阵,K的元素为 表示所有元素都为1的向量; Among them, K represents the kernel matrix, and the elements of K are represents a vector whose elements are all 1;令 并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为: make And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
- 根据权利要求2所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S3中基于二部图的后期融合多视图聚类目标函数,表示为:The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 2, wherein in the step S3, the bipartite graph-based late fusion multi-view clustering objective function is expressed as:s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0其中, 表示由核k均值聚类得到的各个视图的基础划分; 表示各个视图的代表点; 为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为 m表示视图数量。 in, Represents the basic division of each view obtained by kernel k-means clustering; Represents the representative point of each view; is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for m represents the number of views.
- 根据权利要求3所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S4中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 3, wherein in the step S4, the bipartite graph-based late fusion multi-view clustering objective function is solved and established in a circular manner Specifically:利用三步交替法求解公式(3),具体为:Use the three-step alternation method to solve formula (3), specifically:设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:A2.固定γ和Z,优化 采用令目标函数关于A p偏导等于0,得到闭式解 A2. Fixed γ and Z, optimized The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0A3.固定 和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为: A3. Fixed and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
- 根据权利要求4所述的基于二部图的后期融合多视图聚类机器学习方法,其特征在于,所述步骤S4中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:The bipartite graph-based late fusion multi-view clustering machine learning method according to claim 4, characterized in that in step S4, a three-step alternation method is used to solve formula (3), wherein the termination condition of the three-step alternation method represents the for:(obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
- 基于二部图的后期融合多视图聚类机器学习系统,其特征在于,包括:The later fusion multi-view clustering machine learning system based on bipartite graph is characterized in that it includes:获取模块,用于获取聚类任务和目标数据样本;The acquisition module is used to acquire clustering tasks and target data samples;运行模块,用于通过对获取的聚类任务和目标数据样本相对应的各个视图运行核k均值聚类,得到基础划分,并计算各视图多样化正则项;The operation module is used to obtain the basic division by running the kernel k-means clustering on each view corresponding to the obtained clustering task and the target data sample, and calculate the diversification regular term of each view;建立模块,用于利用随机初始化选定各个视图的代表点,建立基于二部图的后期融合多视图聚类目标函数;establishing a module for selecting representative points of each view by random initialization, and establishing a later fusion multi-view clustering objective function based on bipartite graph;求解模块,用于采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数,得到视图融合后的二部图;The solving module is used to solve the established bipartite graph-based later fusion multi-view clustering objective function in a cyclic manner, and obtain the bipartite graph after view fusion;聚类模块,用于对得到的二部图进行谱聚类,得到聚类结果。The clustering module is used to perform spectral clustering on the obtained bipartite graph to obtain the clustering result.
- 根据权利要求6所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述运行模块中运行核k均值聚类,具体为:The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 6, characterized in that, in the operation module, the kernel k-means clustering is performed, specifically:核k均值聚类的目标为最小化基于划分矩阵B∈{0,1} n×k的平方误差和,表示为: The goal of kernel k-means clustering is to minimize the sum of squared errors based on the partition matrix B ∈ {0,1} n×k , expressed as:其中, 表示由n个样本组成的数据集; 表示将样本x投射到一个再生核希尔伯特空间 的特征映射; 表示属于第c个簇的样本个数,1≤c≤k;i表示样本序号;当第i个样本属于第c个簇时,B ic=1,否则,B ic=0; in, represents a dataset consisting of n samples; represents the projection of sample x into a regenerated kernel Hilbert space feature map of ; Indicates the number of samples belonging to the c-th cluster, 1≤c≤k; i indicates the sample serial number; when the i-th sample belongs to the c-th cluster, B ic =1, otherwise, B ic =0;公式(1)化为:Formula (1) is transformed into:其中,K表示核矩阵,K的元素为K ij=φ(x i) Tφ(x j), 表示所有元素都为1的向量; Among them, K represents the kernel matrix, and the elements of K are K ij =φ(x i ) T φ(x j ), represents a vector whose elements are all 1;令 并将离散约束转换为实值正交约束,即H TH=I k,则公式(2)转换为: make And convert the discrete constraints into real-valued orthogonal constraints, that is, H T H=I k , then formula (2) is converted into:其中,I k表示k维单位矩阵。 Among them, I k represents the k-dimensional identity matrix.
- 根据权利要求7所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述建立模块中基于二部图的后期融合多视图聚类目标函数,表示为:The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 7, wherein the bipartite graph-based late fusion multi-view clustering objective function in the establishment module is expressed as:s.t.Z1 s=1 n,Z≥0,γ T1 m=1,γ≥0 stZ1 s =1 n , Z≥0, γ T 1 m =1, γ≥0其中, 表示由核k均值聚类得到的各个视图的基础划分; 表示各个视图的代表点; 为视图融合后的二部图;n,k,s分别表示样本数、聚类簇数和代表点数;λ表示正则化参数;γ表示各个视图的组合系数;M表示视图多样化正则项,元素为 m表示视图数量。 in, Represents the basic division of each view obtained by kernel k-means clustering; Represents the representative point of each view; is the bipartite graph after view fusion; n, k, s represent the number of samples, clusters and representative points respectively; λ represents the regularization parameter; γ represents the combination coefficient of each view; M represents the view diversification regular term, element for m represents the number of views.
- 根据权利要求8所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述求解模块中采用循环方式求解建立的基于二部图的后期融合多视图聚类目标函数具体为:The bipartite graph-based late fusion multi-view clustering machine learning system according to claim 8, wherein the solving module adopts a circular way to solve the established bipartite graph-based late fusion multi-view clustering objective function Specifically:利用三步交替法求解公式(3),具包括:Using the three-step alternating method to solve formula (3), it includes:设Z的第i行为z i,则表示为: Assuming the i-th row of Z i , it is expressed as:第二固定模块,用于固定γ和Z,优化 采用令目标函数关于A p偏导等于0,得到闭式解 Second fixation module for fixing γ and Z, optimized The closed-form solution is obtained by setting the partial derivative of the objective function with respect to A p equal to 0第三固定模块,用于固定 和Z,优化γ,将目标函数转化为带有线性约束的二次规划问题,表示为: Third fixing module for fixing and Z, optimizing γ, transforms the objective function into a quadratic programming problem with linear constraints, expressed as:
- 根据权利要求-所述的基于二部图的后期融合多视图聚类机器学习系统,其特征在于,所述求解模块中利用三步交替法求解公式(3),其中三步交替法终止条件表示为:The bipartite graph-based late-stage fusion multi-view clustering machine learning system according to claim, characterized in that, in the solution module, a three-step alternation method is used to solve formula (3), wherein the three-step alternation method termination condition represents for:(obj (t-1)-obj (t))/obj (t)≤ε (obj (t-1) -obj (t) )/obj (t) ≤ε其中,obj (t-1)、obj (t)分别表示第t和t-1轮迭代的公式(3)的值,ε表示设定精度。 Among them, obj (t-1) and obj (t) represent the values of formula (3) in the t and t-1 iterations, respectively, and ε represents the set precision.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110173493.9A CN112990265A (en) | 2021-02-09 | 2021-02-09 | Post-fusion multi-view clustering machine learning method and system based on bipartite graph |
CN202110173493.9 | 2021-02-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022170840A1 true WO2022170840A1 (en) | 2022-08-18 |
Family
ID=76347689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/136557 WO2022170840A1 (en) | 2021-02-09 | 2021-12-08 | Late fusion multi-view clustering machine learning method and system based on bipartite graph |
Country Status (4)
Country | Link |
---|---|
CN (1) | CN112990265A (en) |
LU (1) | LU502853B1 (en) |
WO (1) | WO2022170840A1 (en) |
ZA (1) | ZA202207736B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117292162A (en) * | 2023-11-27 | 2023-12-26 | 烟台大学 | Target tracking method, system, equipment and medium for multi-view image clustering |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990265A (en) * | 2021-02-09 | 2021-06-18 | 浙江师范大学 | Post-fusion multi-view clustering machine learning method and system based on bipartite graph |
CN113627462A (en) * | 2021-06-24 | 2021-11-09 | 浙江师范大学 | Medical data clustering method and system based on matrix decomposition and multi-partition alignment |
CN113627237A (en) * | 2021-06-24 | 2021-11-09 | 浙江师范大学 | Late-stage fusion face image clustering method and system based on local maximum alignment |
CN113610103A (en) * | 2021-06-24 | 2021-11-05 | 浙江师范大学 | Medical data clustering method and system based on unified anchor point and subspace learning |
CN113837218A (en) * | 2021-08-17 | 2021-12-24 | 浙江师范大学 | Text clustering method and system based on one-step post-fusion multi-view |
CN116152269A (en) * | 2021-11-19 | 2023-05-23 | 华为技术有限公司 | Bipartite graph construction method, bipartite graph display method and bipartite graph construction device |
CN117009838B (en) * | 2023-09-27 | 2024-01-26 | 江西师范大学 | Multi-scale fusion contrast learning multi-view clustering method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112132224A (en) * | 2020-09-28 | 2020-12-25 | 广东工业大学 | Rapid spectrum embedding clustering method based on graph learning |
US20210019325A1 (en) * | 2019-07-15 | 2021-01-21 | Microsoft Technology Licensing, Llc | Graph embedding already-collected but not yet connected data |
CN112287974A (en) * | 2020-09-28 | 2021-01-29 | 北京工业大学 | Multi-view K multi-mean image clustering method based on self-adaptive weight |
CN112990265A (en) * | 2021-02-09 | 2021-06-18 | 浙江师范大学 | Post-fusion multi-view clustering machine learning method and system based on bipartite graph |
-
2021
- 2021-02-09 CN CN202110173493.9A patent/CN112990265A/en active Pending
- 2021-12-08 WO PCT/CN2021/136557 patent/WO2022170840A1/en active Application Filing
- 2021-12-08 LU LU502853A patent/LU502853B1/en active IP Right Grant
-
2022
- 2022-07-12 ZA ZA2022/07736A patent/ZA202207736B/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210019325A1 (en) * | 2019-07-15 | 2021-01-21 | Microsoft Technology Licensing, Llc | Graph embedding already-collected but not yet connected data |
CN112132224A (en) * | 2020-09-28 | 2020-12-25 | 广东工业大学 | Rapid spectrum embedding clustering method based on graph learning |
CN112287974A (en) * | 2020-09-28 | 2021-01-29 | 北京工业大学 | Multi-view K multi-mean image clustering method based on self-adaptive weight |
CN112990265A (en) * | 2021-02-09 | 2021-06-18 | 浙江师范大学 | Post-fusion multi-view clustering machine learning method and system based on bipartite graph |
Non-Patent Citations (2)
Title |
---|
"M.S. Dissertation", 1 November 2016, TIANJIN UNIVERSITY, CN, article JINCI YI: "Research and Application of Clustering Algorithms for Large Scale Data Sets", pages: 1 - 61, XP055958991 * |
LIU XINWANG; ZHU XINZHONG; LI MIAOMIAO; WANG LEI; ZHU EN; LIU TONGLIANG; KLOFT MARIUS; SHEN DINGGANG; YIN JIANPING; GAO WEN: "Multiple Kernel kk-Means with Incomplete Kernels", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY., USA, vol. 42, no. 5, 11 January 2019 (2019-01-11), USA , pages 1191 - 1204, XP011780949, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2019.2892416 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117292162A (en) * | 2023-11-27 | 2023-12-26 | 烟台大学 | Target tracking method, system, equipment and medium for multi-view image clustering |
CN117292162B (en) * | 2023-11-27 | 2024-03-08 | 烟台大学 | Target tracking method, system, equipment and medium for multi-view image clustering |
Also Published As
Publication number | Publication date |
---|---|
ZA202207736B (en) | 2022-07-27 |
CN112990265A (en) | 2021-06-18 |
LU502853B1 (en) | 2023-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022170840A1 (en) | Late fusion multi-view clustering machine learning method and system based on bipartite graph | |
Xue et al. | Deep low-rank subspace ensemble for multi-view clustering | |
Feng et al. | Adaptive unsupervised multi-view feature selection for visual concept recognition | |
Erisoglu et al. | A new algorithm for initial cluster centers in k-means algorithm | |
Huang et al. | Multiple marginal fisher analysis | |
Zhang et al. | Simplifying mixture models through function approximation | |
WO2019015246A1 (en) | Image feature acquisition | |
CN105608478B (en) | image feature extraction and classification combined method and system | |
WO2022253153A1 (en) | Later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement | |
An et al. | Using the relevance vector machine model combined with local phase quantization to predict protein-protein interactions from protein sequences | |
Lock et al. | Supervised multiway factorization | |
Hofmann et al. | Efficient approximations of robust soft learning vector quantization for non-vectorial data | |
WO2022227956A1 (en) | Optimal neighbor multi-kernel clustering method and system based on local kernel | |
Wang et al. | Multi-manifold clustering | |
Lu et al. | Dimension reduction of multimodal data by auto-weighted local discriminant analysis | |
CN106845462A (en) | The face identification method of feature and cluster is selected while induction based on triple | |
Wang et al. | Local tangent space alignment via nuclear norm regularization for incomplete data | |
Dornaika et al. | Single phase multi-view clustering using unified graph learning and spectral representation | |
Feng et al. | Automatic instance selection via locality constrained sparse representation for missing value estimation | |
Lafaye de Micheaux et al. | Pls for big data: a unified parallel algorithm for regularised group pls | |
WO2022267955A1 (en) | Post-fusion multi-view clustering method and system based on local maximum alignment | |
Pan et al. | Revised contrastive loss for robust age estimation from face | |
CN112800138B (en) | Big data classification method and system | |
Livi et al. | Dissimilarity space embedding of labeled graphs by a clustering-based compression procedure | |
Boukouvalas | Development of ICA and IVA algorithms with application to medical image analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21925487 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21925487 Country of ref document: EP Kind code of ref document: A1 |