CN109002854A - Based on hidden expression and adaptive multiple view Subspace clustering method - Google Patents

Based on hidden expression and adaptive multiple view Subspace clustering method Download PDF

Info

Publication number
CN109002854A
CN109002854A CN201810801776.1A CN201810801776A CN109002854A CN 109002854 A CN109002854 A CN 109002854A CN 201810801776 A CN201810801776 A CN 201810801776A CN 109002854 A CN109002854 A CN 109002854A
Authority
CN
China
Prior art keywords
matrix
view
clustering
objective function
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810801776.1A
Other languages
Chinese (zh)
Inventor
王秀美
张越美
高新波
张天真
李洁
邓成
田春娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810801776.1A priority Critical patent/CN109002854A/en
Publication of CN109002854A publication Critical patent/CN109002854A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention proposes a kind of based on hidden expression and adaptive multiple view Subspace clustering method, it mainly solves the problems, such as that cluster accuracy rate present in multiple view clustering method is low, realizes step are as follows: (1) obtain the multiple view data matrix of raw data set;(2) Laplacian Matrix of multiple view data matrix is calculated;(3) objective function based on hidden expression and adaptive multiple view subspace clustering is constructed;(4) objective function is optimized;(5) variable in the objective function after optimization is initialized;(6) alternating iteration is carried out to the variable in the objective function after optimization;(7) value of the multiple view in the objective function after calculation optimization from expression coefficient matrix;(8) raw data set is clustered.The present invention makes full use of the information of multiple views, effectively increases the accuracy rate of multiple view cluster, can be used for image segmentation, business analysis, the fields such as biological classification using hidden expression and adaptively.

Description

基于隐表示和自适应的多视图子空间聚类方法Multi-view Subspace Clustering Method Based on Hidden Representation and Adaptation

技术领域technical field

本发明属于计算机视觉和模式识别技术领域,涉及一种多视图子空间聚类方法,具体涉及一种基于隐表示和自适应的多视图子空间聚类方法,可用于图像分割,商业分析以及生物分类等。The invention belongs to the technical field of computer vision and pattern recognition, and relates to a multi-view subspace clustering method, in particular to a multi-view subspace clustering method based on implicit representation and self-adaptation, which can be used for image segmentation, business analysis and biological Classification etc.

背景技术Background technique

近年来,由于计算机信息技术的快速发展,在新兴技术改变人类社会的同时,随之而来的还有数据的爆发式增长,因此数据的获取和分析变得越来越重要。数据挖掘是在大量数据中发现隐藏信息、提取知识的一个过程。聚类作为一种重要的数据挖掘方法,是将物理或抽象对象的集合分成由类似的对象组成的多个簇的过程,使得同一个簇中的对象具有较高的相似性,不同簇中的对象具有较低的相似性。In recent years, due to the rapid development of computer information technology, while emerging technologies are changing human society, there is also an explosive growth of data, so data acquisition and analysis have become more and more important. Data mining is a process of discovering hidden information and extracting knowledge from a large amount of data. As an important data mining method, clustering is the process of dividing a collection of physical or abstract objects into multiple clusters composed of similar objects, so that objects in the same cluster have high similarity, and objects in different clusters have higher similarity. Objects have low similarity.

传统的数据集是用单一的特征表示的,称为单视图数据集。但是单视图数据集中所包含的原始数据集中的信息并不完整,为解决这一问题,现有技术使得数据集可以用多种特征进行表示,称为多视图数据集。例如,同一张图片可以用SIFT和HOG等特征进行描述;对于同一篇新闻报道,可以使用不同的语言进行表达;对于网页数据,文本和链接信息可以作为两个不同的视图。如果采用传统聚类方法对这种数据集进行聚类,由于不能充分利用多个视图的信息,聚类效果往往不够理想。因此,多视图聚类被提了出来。多视图聚类通过利用多个视图的一致性和差异性,往往能得到更加准确的聚类结果。Traditional datasets are represented by a single feature, called single-view datasets. However, the information contained in the original data set in the single-view data set is not complete. To solve this problem, the existing technology enables the data set to be represented by multiple features, which is called a multi-view data set. For example, the same picture can be described by features such as SIFT and HOG; for the same news report, it can be expressed in different languages; for web page data, text and link information can be used as two different views. If the traditional clustering method is used to cluster this kind of data set, the clustering effect is often not ideal because the information of multiple views cannot be fully utilized. Therefore, multi-view clustering is proposed. Multi-view clustering can often get more accurate clustering results by utilizing the consistency and difference of multiple views.

多视图聚类算法可分为基于K均值的多视图聚类算法和基于谱聚类的多视图聚类算法两类,基于K均值的多视图聚类算法,由于其初始点的选择是随机的,而聚类的结果与初始点的选择相关性比较大,因此该类方法聚类的结果具有不稳定性。Multi-view clustering algorithms can be divided into two categories: multi-view clustering algorithms based on K-means and multi-view clustering algorithms based on spectral clustering. Multi-view clustering algorithms based on K-means, because the selection of the initial point is random , and the clustering result has a relatively large correlation with the selection of the initial point, so the clustering result of this type of method is unstable.

基于谱聚类的多视图聚类算法由于能够保持数据集的样本之间的局部几何结构,往往能得到较为稳定的聚类结果,因此出现了很多基于谱聚类的多视图聚类算法。The multi-view clustering algorithm based on spectral clustering can often obtain relatively stable clustering results because it can maintain the local geometric structure between the samples of the data set, so many multi-view clustering algorithms based on spectral clustering have emerged.

由于数据集中的样本是分布在特定的低维子空间上的,基于这个事实,近年来,出现了很多子空间聚类(Subspace Clustering,SC)算法,子空间聚类是谱聚类算法当中的一部分。该类算法利用任意样本可以用该样本所在的子空间中的样本进行线性组合这一性质,将数据矩阵分解为数据矩阵和视图自表示系数矩阵的乘积。进而利用视图自表示系数矩阵获得聚类结果。由于视图自表示系数矩阵具有较好的可解释性和明确的物理意义等优点,子空间聚类成为数据聚类的基本工具,并且在单视图聚类和多视图聚类当中具有广泛的应用。Since the samples in the data set are distributed in a specific low-dimensional subspace, based on this fact, many Subspace Clustering (SC) algorithms have emerged in recent years. Subspace clustering is one of the spectral clustering algorithms. part. This type of algorithm takes advantage of the property that any sample can be linearly combined with the samples in the subspace where the sample is located, and decomposes the data matrix into the product of the data matrix and the view self-expression coefficient matrix. Then the clustering result is obtained by using the self-expression coefficient matrix of the view. Due to the good interpretability and clear physical meaning of the view self-representation coefficient matrix, subspace clustering has become a basic tool for data clustering, and has a wide range of applications in single-view clustering and multi-view clustering.

例如,Hongchang Gao,Feiping Nie,Xuelong Li和Heng Huang在2015年的IEEEconference on Computer Vision and Pattern Recognition会议中,发表了名为“Multi-View Subspace Clustering”的文章,公开了一种多视图子空间聚类方法,该方法对多视图数据集的每个视图矩阵进行矩阵分解,得到每个视图矩阵对应的视图自表示系数矩阵,为得到一致的聚类结果,该方法利用每个视图自表示系数矩阵构造原始数据集的相似度矩阵,利用谱聚类的方法得到原始数据集的聚类结果,为减小原始数据集中的噪声数据对数据聚类准确率的影响,该方法将每个视图数据矩阵与该视图数据矩阵和该视图对应的自表示系数矩阵乘积的差作为噪声数据矩阵,然后对该噪声数据矩阵进行约束,但是该方法只能去除特定类型的噪声数据,对一般类型的噪声数据不具有鲁棒性,因此ChangqingZhang,Qinghua Hu,Huazhu Fu和Pengfei Zhu等人在2017年的IEEEconference onComputer Vision and Pattern Recognition会议中,发表了名为“Latent Multi-viewSubspace Clustering”的文章,公开了一种隐多视图子空间聚类方法,为减小原始数据集中包含的噪声数据对原始数据集的聚类结果准确性的影响,该方法假设原始数据集的所有视图来源于同一个表示,称作多视图隐表示,每个视图矩阵是由多视图隐表示和每个视图所对应的基矩阵的乘积得到的,多视图隐表示是多视图矩阵进行去除噪声之后得到的一个原始数据集的一致的表示,该方法在得到多视图隐表示的同时,使多视图隐表示可以进行自表示,从而得到多视图数据集的自表示系数矩阵,进而利用谱聚类得到一致的聚类结果。但是该方法将每个视图中所包含的信息量看作是相同的,但事实上,原始数据集的每个视图的信息量是不同的,该方法忽略了这个事实,同时没有考虑到多视图隐表示应该保持每个视图内的局部几何结构,进而影响了原始数据集聚类结果的准确性。For example, Hongchang Gao, Feiping Nie, Xuelong Li and Heng Huang published an article titled "Multi-View Subspace Clustering" in the IEEE conference on Computer Vision and Pattern Recognition in 2015, disclosing a multi-view subspace clustering Class method, this method decomposes each view matrix of the multi-view dataset to obtain the view self-expression coefficient matrix corresponding to each view matrix, in order to obtain consistent clustering results, this method uses each view self-representation coefficient matrix Construct the similarity matrix of the original data set, and use the method of spectral clustering to obtain the clustering results of the original data set. The difference between the product of the view data matrix and the self-expression coefficient matrix corresponding to the view is used as the noise data matrix, and then the noise data matrix is constrained, but this method can only remove specific types of noise data, and does not apply to general types of noise data It is robust, so Changqing Zhang, Qinghua Hu, Huazhu Fu and Pengfei Zhu et al. published an article called "Latent Multi-viewSubspace Clustering" in the 2017 IEEEconference onComputer Vision and Pattern Recognition, disclosing a hidden Multi-view subspace clustering method, in order to reduce the impact of noise data contained in the original data set on the accuracy of the clustering results of the original data set, this method assumes that all views of the original data set come from the same representation, called multi-view Implicit representation, each view matrix is obtained by multiplying the multi-view implicit representation and the base matrix corresponding to each view, the multi-view implicit representation is a consistent representation of an original data set obtained after the multi-view matrix is denoised, While obtaining the multi-view latent representation, the method enables the multi-view latent representation to be self-represented, thereby obtaining the self-representation coefficient matrix of the multi-view dataset, and then using spectral clustering to obtain consistent clustering results. However, this method regards the amount of information contained in each view as the same, but in fact, the amount of information contained in each view of the original dataset is different. This method ignores this fact and does not take into account the multi-view The implicit representation should preserve the local geometric structure within each view, which affects the accuracy of clustering results on the original dataset.

发明内容Contents of the invention

本发明的目的在于克服上述现有技术存在的不足,提出了一种基于隐表示和自适应的多视图子空间聚类方法,用于提高多视图数据集聚类的准确率。The purpose of the present invention is to overcome the shortcomings of the above-mentioned prior art, and propose a multi-view subspace clustering method based on implicit representation and self-adaptation, which is used to improve the accuracy of multi-view data set clustering.

本发明的技术思路是:以自适应的方式学习多视图数据集的多视图隐表示,在得到该多视图隐表示的同时,利用图正则化使多视图隐表示保持每个视图矩阵内的局部几何结构,并且对多视图隐表示进行矩阵分解,获得多视图自表示系数矩阵,利用谱聚类得到聚类结果。实现步骤如下:The technical idea of the present invention is to learn the multi-view latent representation of the multi-view data set in an adaptive manner, while obtaining the multi-view latent representation, use graph regularization to make the multi-view latent representation maintain the local geometric structure, and perform matrix decomposition on multi-view implicit representation to obtain multi-view self-expression coefficient matrix, and use spectral clustering to obtain clustering results. The implementation steps are as follows:

(1)获取原始数据集的多视图数据矩阵 (1) Obtain the multi-view data matrix of the original data set

从原始数据集包含的多幅图像中分别提取不同类型的特征数据,相同特征数据组成视图矩阵,多个视图矩阵组成原始数据集的多视图数据矩阵其中,X(v)表示第v个视图矩阵,v=1,2,…,m,m表示视图矩阵的数目,m≥2;Extract different types of feature data from multiple images contained in the original data set, the same feature data form a view matrix, and multiple view matrices form the multi-view data matrix of the original data set Wherein, X (v) represents the vth view matrix, v=1, 2,..., m, m represents the number of view matrices, m≥2;

(2)计算多视图数据矩阵的拉普拉斯矩阵 (2) Calculate the multi-view data matrix The Laplacian matrix

(3)构建基于隐表示和自适应的多视图子空间聚类的目标函数J:(3) Construct the objective function J of multi-view subspace clustering based on implicit representation and self-adaptation:

(3a)将分解为多视图隐表示矩阵H和多视图基矩阵设置X(v)的基矩阵P(v)的约束为O,O表示P(v)P(v)T=I,将和H乘积的差作为误差重构项其中,I表示单位矩阵,(·)T表示矩阵的转置;(3a) will Decomposed into multi-view implicit representation matrix H and multi-view base matrix Set the constraint of the base matrix P (v) of X (v) to be O, O means P (v) P (v)T = I, will and and the product of H as the error reconstruction term Wherein, I represents the identity matrix, and ( ) T represents the transposition of the matrix;

(3b)计算的度量的自适应权重为的权重参数,其中,表示矩阵F范数的平方,γ表示调节参数,γ≥0, (3b) calculation measure of Assume The adaptive weight of is Yes The weight parameter of , where, Indicates the square of the matrix F norm, γ indicates the adjustment parameter, γ≥0,

(3c)将多视图隐表示矩阵H分解为多视图隐表示矩阵H和多视图自表示系数矩阵Z,将H与H和Z乘积的差作为误差重构项Er,Er=H-HZ,并计算Er的度量||Er||2,1,设||Er||2,1的权重为λ1,其中,||·||2,1表示矩阵的2,1范数;(3c) Decompose the multi-view implicit representation matrix H into a multi-view implicit representation matrix H and a multi-view self-representation coefficient matrix Z, and use the difference between H and the product of H and Z as the error reconstruction item E r , E r =H-HZ , and calculate the measure of E r ||E r || 2,1 , let the weight of ||E r || 2,1 be λ 1 , where ||·|| number;

(3d)构造多视图自表示系数矩阵Z的低秩约束项||Z||*,设置||Z||*的权重为λ2,其中||·||*表示矩阵的核范数;(3d) Construct the low-rank constraint item ||Z|| * of the multi-view self-representation coefficient matrix Z, and set the weight of ||Z|| * to λ 2 , where ||·|| * represents the kernel norm of the matrix;

(3e)利用拉普拉斯矩阵构造多视图数据矩阵内的相似性约束项设置的权重为λ3,其中,tr(·)表示矩阵的迹;(3e) Using the Laplacian matrix Construct multi-view data matrix Similarity constraints in set up The weight of is λ 3 , where tr(·) represents the trace of the matrix;

(3f)将||Er||2,1、||Z||*进行加权相加,得到基于隐表示和自适应的多视图子空间聚类的目标函数J:(3f) will ||E r || 2,1 , ||Z|| * and Perform weighted addition to obtain the objective function J of multi-view subspace clustering based on implicit representation and self-adaptation:

(4)对目标函数J进行优化:(4) Optimize the objective function J:

采用交替方向乘子法对目标函数J进行优化,将A作为Z的辅助矩阵变量,并且设置A的约束为A=Z,将作为的Lagrangian乘子,将Q1作为Er=H-HZ的Lagrangian乘子,将Q2作为A=Z的Lagrangian乘子,得到优化后的目标函数J':The objective function J is optimized using the alternate direction multiplier method, and A is used as the auxiliary matrix variable of Z, and the constraint of A is set as A=Z, and the as The Lagrangian multiplier of , Q 1 is used as the Lagrangian multiplier of E r =H-HZ, Q 2 is used as the Lagrangian multiplier of A=Z, and the optimized objective function J' is obtained:

其中,<·,·>表示矩阵的内积,μ表示正则化系数;in, <·,·> represents the inner product of the matrix, μ represents the regularization coefficient;

(5)对优化后的目标函数J'中的变量进行初始化:(5) Initialize the variables in the optimized objective function J':

将J'中的Z、ErQ1、Q2和A包含的所有元素初始化为0,H包含的所有元素初始化为(0,1)之间的随机数,将μ初始化为0.001;will J' in Z, E r , All elements contained in Q 1 , Q 2 and A are initialized to 0, all elements contained in H are initialized to random numbers between (0,1), and μ is initialized to 0.001;

(6)对优化后的目标函数J'中的变量进行交替迭代:(6) The variables in the optimized objective function J' are alternately iterated:

对J'中的变量H、Z、ErA、Q1、Q2和μ进行交替迭代,得到与各变量的迭代更新表达式HS、ZS ErSASQ1S、Q2S和μSFor the variables H, Z, and E r , A. Q 1 , Q 2 and μ are alternately iterated to obtain the iterative update expressions H S , Z S , E rS , A S , Q 1S , Q 2S and μ S ;

(7)计算优化后的目标函数J'中变量Z的值:(7) Calculate the value of variable Z in the optimized objective function J':

(7a)设定优化后的目标函数J'的最大迭代次数;(7a) setting the maximum number of iterations of the optimized objective function J';

(7b)利用J'中的各变量的迭代更新表达式对相应的变量进行迭代更新,直到迭代次数与设定的最大迭代次数相等时停止迭代,得到更新后的多视图自表示系数矩阵 (7b) Use the iterative update expression of each variable in J' to iteratively update the corresponding variables, and stop iterating until the number of iterations is equal to the set maximum number of iterations, and obtain the updated multi-view self-expression coefficient matrix

(8)对原始数据集进行聚类:(8) Cluster the original data set:

(8a)计算原始数据集的相似度矩阵S;(8a) Calculate the similarity matrix S of the original data set;

(8b)计算原始数据集的聚类结果:(8b) Calculate the clustering result of the original data set:

(8b1)对相似度矩阵S的每一行求和得到的向量t进行对角化,得到S的度矩阵D,并计算的拉普拉斯矩阵L, (8b1) Diagonalize the vector t obtained by summing each row of the similarity matrix S to obtain the degree matrix D of S, and calculate the Laplacian matrix L,

(8b2)对拉普拉斯矩阵L进行特征值分解,得到特征值集合E和E中的每个特征值对应的特征向量组成的矩阵T;(8b2) performing eigenvalue decomposition on the Laplacian matrix L to obtain the matrix T formed by the eigenvectors corresponding to each eigenvalue in the eigenvalue set E and E;

(8b3)对E中的特征值按照从小到大的顺序进行排列,得到特征值集合E',取E'的前K个特征值组成集合EK,并从T中选取与EK中的每个特征值对应的特征向量组成特征向量矩阵T',再将T'每一行归一化的结果作为样本数据点,其中,K表示T'中样本数据点的类别数,2≤K<N,N表示原始数据集中样本数据点的数目;(8b3) Arrange the eigenvalues in E in ascending order to obtain the eigenvalue set E', take the first K eigenvalues of E' to form the set E K , and select from T each The eigenvectors corresponding to the eigenvalues form the eigenvector matrix T', and then the normalized results of each row of T' are used as sample data points, where K represents the number of categories of sample data points in T', 2≤K<N, N represents the number of sample data points in the original dataset;

(8b4)随机选取T'中的K个样本数据点,并将每个样本数据点作为初始的一类的聚类中心,得到K个聚类中心组成的聚类中心集合R;(8b4) Randomly select K sample data points in T', and use each sample data point as an initial cluster center of a class to obtain a cluster center set R composed of K cluster centers;

(8b5)计算T'中每个样本数据点到R中的每个聚类中心的欧式距离,并将各样本数据点分配到与自身欧氏距离最小的聚类中心所属的类别中,计算属于第k个类别的样本数据点的均值作为第k个类别的聚类中心,得到K个类别的聚类中心,实现对R进行更新,其中,k=1,2,…,K;(8b5) Calculate the Euclidean distance from each sample data point in T' to each cluster center in R, and assign each sample data point to the category of the cluster center with the smallest Euclidean distance to itself, and calculate the The mean value of the sample data points of the kth category is used as the clustering center of the kth category, and the clustering centers of the K categories are obtained, and R is updated, where k=1,2,...,K;

(8b6)重复执行步骤(8b5),直到聚类中心集合R不再发生变化为止,得到原始数据集的聚类结果。(8b6) Step (8b5) is repeatedly executed until the cluster center set R no longer changes, and the clustering result of the original data set is obtained.

本发明与现有技术相比,具有以下优点:Compared with the prior art, the present invention has the following advantages:

本发明构建目标函数时,对多视图数据矩阵进行矩阵分解得到多视图隐表示,并采用自适应的方法,利用一个参数衡量每个视图对得到多视图隐表示矩阵的重要程度,自动学习用多视图基矩阵和多视图隐表示矩阵表示多视图数据矩阵的误差重构项的参数,同时,构造多视图数据矩阵内的相似性约束项使多视图隐表示保持每个视图内的局部几何结构,充分利用了各个视图的信息,能够得到更准确的原始数据集的相似度矩阵,与现有技术相比,有效提高了多视图聚类的准确率。When constructing the objective function, the present invention decomposes the multi-view data matrix to obtain the multi-view implicit representation, and uses an adaptive method to measure the importance of each view to obtain the multi-view implicit representation matrix. The view basis matrix and the multi-view implicit representation matrix represent the parameters of the error reconstruction item of the multi-view data matrix. At the same time, the similarity constraint item in the multi-view data matrix is constructed to make the multi-view implicit representation maintain the local geometric structure in each view, By making full use of the information of each view, a more accurate similarity matrix of the original data set can be obtained, and compared with the existing technology, the accuracy rate of multi-view clustering is effectively improved.

附图说明Description of drawings

图1为本发明的实现流程图;Fig. 1 is the realization flowchart of the present invention;

图2-图4分别为本发明与现有的隐多视图子空间聚类方法在BBCSport数据集、MSRC-v1数据集和Caltech101-7数据集下的聚类准确率仿真结果对比图;Fig. 2-Fig. 4 is respectively the clustering accuracy rate simulation result comparison figure of the present invention and existing implicit multi-view subspace clustering method under BBCSport data set, MSRC-v1 data set and Caltech101-7 data set;

具体实施方式Detailed ways

下面结合附图和具体实施例,对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

参照图1,一种基于隐表示和自适应的多视图子空间聚类方法,包括如下步骤:Referring to Figure 1, a multi-view subspace clustering method based on implicit representation and self-adaptation includes the following steps:

步骤1)获取原始数据集的多视图数据矩阵 Step 1) Obtain the multi-view data matrix of the original dataset

由于每一幅图像具有多种类型的特征,因此可以提取每幅图像的不同类型的特征,从原始数据集包含的多幅图像中分别提取不同类型的特征数据,相同特征数据组成视图矩阵,多个视图矩阵组成原始数据集的多视图数据矩阵其中,X(v)表示第v个视图矩阵,v=1,2,…,m,m表示视图矩阵的数目,m≥2。Since each image has multiple types of features, different types of features of each image can be extracted, and different types of feature data can be extracted from multiple images contained in the original data set, and the same feature data can form a view matrix. multi-view data matrix of the original dataset Wherein, X (v) represents the vth view matrix, v=1, 2, ..., m, m represents the number of view matrices, and m≥2.

步骤2)计算多视图数据矩阵的拉普拉斯矩阵实现步骤为:Step 2) Calculate the multi-view data matrix The Laplacian matrix The implementation steps are:

(2a)将第v个视图矩阵X(v)的每列作为一个样本点,计算任意两个样本点之间的欧式距离i,j=1,2,…,N,N表示原始数据集中样本点的数目;(2a) Take each column of the vth view matrix X (v) as a sample point, and calculate any two sample points and Euclidean distance between i,j=1,2,...,N, N represents the number of sample points in the original data set;

(2b)计算多视图数据矩阵的关联矩阵 其中,W(v)表示第v个视图矩阵的关联矩阵,表示W(v)的第i行第j列的元素,σ表示高斯核的带宽参数;(2b) Calculate the multi-view data matrix The incidence matrix where W (v) represents the incidence matrix of the vth view matrix, Represents the element of row i and column j of W (v) , and σ represents the bandwidth parameter of the Gaussian kernel;

(2c)对关联矩阵的各行求和,得到向量r,将r进行对角化得到对角的多视图度矩阵并计算多视图数据矩阵的拉普拉斯矩阵 (2c) for the incidence matrix The summation of each row of the vector r is obtained, and the diagonalization of r is obtained to obtain the diagonal multi-view matrix and calculate the multi-view data matrix The Laplacian matrix

步骤3)构建基于隐表示和自适应的多视图子空间聚类的目标函数J:Step 3) Construct the objective function J for multi-view subspace clustering based on hidden representation and self-adaptation:

(3a)由于本发明采用的假设是多视图数据矩阵是由同一个多视图隐表示矩阵H和多视图基矩阵相乘得到的,因此将分解为多视图隐表示矩阵H和多视图基矩阵设置第v个视图矩阵X(v)的基矩阵P(v)的约束为O,O表示P(v)P(v)T=I,为了衡量的不一致性,将和H乘积的差作为误差重构项 其中,I表示单位矩阵,(·)T表示矩阵的转置;(3a) Since the assumption adopted in the present invention is the multi-view data matrix is composed of the same multi-view implicit representation matrix H and multi-view basis matrix are multiplied together, so the Decomposed into multi-view implicit representation matrix H and multi-view base matrix Set the constraint of the base matrix P (v) of the vth view matrix X (v) to O, O means P (v) P (v)T = I, in order to measure and inconsistency, the and and the product of H as the error reconstruction term Wherein, I represents the identity matrix, and ( ) T represents the transposition of the matrix;

(3b)为了衡量的不一致性的大小,计算的度量的自适应权重为的权重参数,其中,表示矩阵F范数的平方,γ表示调节参数,γ≥0,由于无法获取视图矩阵X(v)所包含的信息量,因此采用自适应权重自动学习的权重,γ的作用是调节的分布;(3b) To measure and The magnitude of the inconsistency, computed by measure of Assume The adaptive weight of is Yes The weight parameter of , where, Indicates the square of the matrix F norm, γ indicates the adjustment parameter, γ≥0, Since the amount of information contained in the view matrix X (v) cannot be obtained, adaptive weights are used automatic learning The weight of , the role of γ is to adjust Distribution;

(3c)由于H是多视图隐表示矩阵,因此H是的综合的表示,为了获得中的样本之间的相似度组成的矩阵,将多视图隐表示矩阵H分解为多视图隐表示矩阵H和多视图自表示系数矩阵Z,为了衡量H和HZ的不一致性,将H与H和Z乘积的差作为误差重构项Er,Er=H-HZ,为了衡量H和HZ不一致性的大小,计算Er的度量||Er||2,1,设||Er||2,1的权重为λ1,其中,||·||2,1表示矩阵的2,1范数,计算Er的2,1范数是为了使多视图隐表示能够去除一般类型的噪声,增强本发明的目标函数对一般噪声的鲁棒性;(3c) Since H is a multi-view implicit representation matrix, H is A comprehensive representation of , in order to obtain The matrix composed of the similarity between the samples in the multi-view hidden representation matrix H is decomposed into the multi-view latent representation matrix H and the multi-view self-expression coefficient matrix Z. In order to measure the inconsistency between H and HZ, H and H and The difference of the Z product is used as the error reconstruction item E r , E r =H-HZ, in order to measure the size of the inconsistency between H and HZ, calculate the measure of E r ||E r || 2,1 , let ||E r | The weight of | 2,1 is λ 1 , where ||·|| 2,1 represents the 2,1 norm of the matrix, and the calculation of the 2,1 norm of E r is to enable the multi-view implicit representation to remove the general type Noise, enhance the robustness of the objective function of the present invention to general noise;

(3d)为了得到唯一的Z,并且使Z具备原始数据集相似度矩阵的低秩结构,因此构造Z的低秩约束项||Z||*,设置||Z||*的权重为λ2,其中||·||*表示矩阵的核范数;(3d) In order to obtain the unique Z and make Z have the low-rank structure of the similarity matrix of the original data set, the low-rank constraint item ||Z|| * of Z is constructed, and the weight of ||Z|| * is set to λ 2 , where ||·|| * represents the kernel norm of the matrix;

(3e)为了使H保持中的样本的局部几何结构,构造多视图数据矩阵内的相似性约束项设置的权重为λ3,其中,tr(·)表示矩阵的迹;(3e) In order for H to hold The local geometry of the samples in , constructing the multi-view data matrix Similarity constraints in set up The weight of is λ 3 , where tr(·) represents the trace of the matrix;

(3f)将||Er||2,1、||Z||*进行加权相加,得到基于隐表示和自适应的多视图子空间聚类的目标函数J:(3f) will ||E r || 2,1 , ||Z|| * and Perform weighted addition to obtain the objective function J of multi-view subspace clustering based on implicit representation and self-adaptation:

步骤4)对目标函数J进行优化:Step 4) optimize the objective function J:

由于目标函数J中含有J中的变量的约束,因此无法直接获取J中的各个变量的解的表达式,对目标函数J进行优化:Since the objective function J contains the constraints of the variables in J, it is impossible to directly obtain the expression of the solution of each variable in J, and optimize the objective function J:

采用交替方向乘子法对目标函数J进行优化,将A作为Z的辅助矩阵变量,并且设置A的约束为A=Z,将作为的Lagrangian乘子,将Q1作为Er=H-HZ的Lagrangian乘子,将Q2作为A=Z的Lagrangian乘子,得到优化后的目标函数J':The objective function J is optimized using the alternate direction multiplier method, and A is used as the auxiliary matrix variable of Z, and the constraint of A is set as A=Z, and the as The Lagrangian multiplier of , Q 1 is used as the Lagrangian multiplier of E r =H-HZ, Q 2 is used as the Lagrangian multiplier of A=Z, and the optimized objective function J' is obtained:

其中,<·,·>表示矩阵的内积,μ表示正则化系数。in, <·,·> represents the inner product of the matrix, and μ represents the regularization coefficient.

步骤5)对优化后的目标函数J'中的变量进行初始化:Step 5) Initialize the variables in the optimized objective function J':

将J'中的Z、ErQ1、Q2和A包含的所有元素初始化为0,H包含的所有元素初始化为(0,1)之间的随机数,将μ初始化为0.001,对J'中的变量进行初始化时为了保证J'的优化算法能够迭代运行。will J' in Z, E r , All elements contained in Q 1 , Q 2 and A are initialized to 0, all elements contained in H are initialized to random numbers between (0,1), and μ is initialized to 0.001. When initializing the variables in J', in order to ensure J''s optimization algorithm can be run iteratively.

步骤6)对优化后的目标函数J'中的变量进行交替迭代:Step 6) Iterate alternately the variables in the optimized objective function J':

对J'中的变量H、Z、ErA、Q1、Q2和μ进行交替迭代,得到与各变量的迭代更新表达式HS、ZS ErSASQ1S、Q2S和μS,实现步骤为:For the variables H, Z, and E r , A. Q 1 , Q 2 and μ are alternately iterated to obtain the iterative update expressions H S , Z S , E rS , A S , Q 1S , Q 2S and μ S , the realization steps are:

(Ⅰ)利用lyap(P,Q,-T')迭代更新多视图隐表示矩阵H, 其中,lyap(·)表示Sylvester方程的解;(I) Use lyap(P,Q,-T') to iteratively update the multi-view hidden representation matrix H, Among them, lyap( ) represents the solution of Sylvester equation;

(Ⅱ)利用(HTH+I)-1(A+HTH-HTEr+(Q2+HTQ1)/μ)迭代更新多视图自表示系数矩阵Z;(II) Use (H T H+I) -1 (A+H T HH T E r +(Q 2 +H T Q 1 )/μ) to iteratively update the multi-view self-expression coefficient matrix Z;

(Ⅲ)利用迭代更新多视图基矩阵其中表示矩阵的左奇异向量组成的矩阵的集合和右奇异向量组成的矩阵的集合;(Ⅲ) Use Iteratively update the multi-view basis matrix in and representation matrix The set of matrices composed of left singular vectors and the set of matrices composed of right singular vectors;

(Ⅳ)利用迭代更新误差重构项 (Ⅳ) Use Iterative update error reconstruction term

(Ⅴ)利用迭代更新误差重构项Er的第j列,其中,B:,j表示矩阵B的第j列;(Ⅴ) Utilization Iteratively update the jth column of the error reconstruction item E r , where, B :,j represents the jth column of matrix B;

(Ⅵ)利用迭代更新辅助矩阵变量A,其中,U和V表示矩阵的左奇异向量和右奇异向量,Σ表示奇异值组成的对角矩阵,Sδ(X)=max(X-δ,0)+min(X+δ,0)表示收缩算子,其中,max(·,·)表示两个数的最大值,min(·,·)表示两个数的最小值;(Ⅵ) Use Iteratively update the auxiliary matrix variable A, where U and V represent the matrix The left singular vector and right singular vector of , Σ represents the diagonal matrix composed of singular values, S δ (X)=max(X-δ,0)+min(X+δ,0) represents the contraction operator, where max (·,·) represents the maximum value of two numbers, min(·,·) represents the minimum value of two numbers;

(Ⅶ)利用迭代更新 (VII) Utilization iterative update

(Ⅷ)利用Q1+μ(H-HZ-Er)迭代更新Q1迭代更新Q2=Q2+μ(A-Z)迭代更新Q2,ρμ迭代更新μ,其中,ρ表示μ的调节参数,ρ≥1。(Ⅷ) Utilize Q 1 +μ(H-HZ-E r ) to iteratively update Q 1 , iterative update Q 2 =Q 2 +μ(AZ) iteratively updates Q 2 , and ρμ iteratively updates μ, where ρ represents an adjustment parameter of μ, and ρ≥1.

步骤7)计算优化后的目标函数J'中变量Z的值:Step 7) Calculate the value of variable Z in the optimized objective function J':

(7a)设定优化后的目标函数J'的最大迭代次数;(7a) setting the maximum number of iterations of the optimized objective function J';

(7b)利用J'中的各变量的迭代更新表达式对相应的变量进行迭代更新,直到迭代次数与设定的最大迭代次数相等时停止迭代,由于Z能够反映原始数据集中的数据之间的相似性,得到更新后的多视图自表示系数矩阵由于J'中的各变量可以用除自身之外的J'中的其它变量进行计算,因此可以在达到设定的J'的最大迭代次数之前,使J'中的变量一直进行迭代更新,在达到设定的J'的最大迭代次数后,得到更新后的J'的变量 (7b) Use the iterative update expression of each variable in J' to iteratively update the corresponding variables, and stop iterating until the number of iterations is equal to the maximum number of iterations set, because Z can reflect the relationship between the data in the original data set Similarity, get the updated multi-view self-representation coefficient matrix Since each variable in J' can be calculated with other variables in J' except itself, the variables in J' can be iteratively updated before reaching the set maximum number of iterations of J'. After reaching the set maximum number of iterations of J', get the updated variable of J'

步骤8)对原始数据集进行聚类:Step 8) Cluster the original dataset:

(8a)由于体现原始数据集中的样本之间的相似性关系,因此可以用计算原始数据集的相似度矩阵S,计算公式为:(8a) due to Reflects the similarity relationship between samples in the original data set, so it can be used Calculate the similarity matrix S of the original data set, the calculation formula is:

其中,|·|表示对矩阵的每个元素取绝对值后组成的矩阵;Among them, |·| represents the matrix formed after taking the absolute value of each element of the matrix;

(8b)计算原始数据集的聚类结果:(8b) Calculate the clustering result of the original data set:

(8b1)对相似度矩阵S的每一行求和得到的向量t对角化,得到S的度矩阵D,并计算S的拉普拉斯矩阵L, (8b1) Diagonalize the vector t obtained by summing each row of the similarity matrix S to obtain the degree matrix D of S, and calculate the Laplacian matrix L of S,

(8b2)对拉普拉斯矩阵L进行特征值分解,得到L的特征值的集合E和E中的每个特征值对应的特征向量组成的集合T,对E中的特征值从小到大进行排序得到排序之后的特征向量的集合E',取E'的前K个特征值EK,取EK对应的T中的特征向量组成特征向量矩阵T',对T'的每一行归一化,将T'的每一行作为一个样本数据点;(8b2) Decompose the eigenvalues of the Laplacian matrix L to obtain the set E of the eigenvalues of L and the set T composed of the eigenvectors corresponding to each eigenvalue in E, and carry out the eigenvalues in E from small to large Sort to obtain the set E' of the sorted eigenvectors, take the first K eigenvalues E K of E', take the eigenvectors in T corresponding to E K to form the eigenvector matrix T', and normalize each row of T' , taking each row of T' as a sample data point;

(8b3)随机选取特征向量矩阵T'的K个样本数据点作为初始的K个类的聚类中心R;(8b3) Randomly select K sample data points of the eigenvector matrix T' as the cluster centers R of the initial K classes;

(8b4)随机选取T'中的K个样本数据点,并将每个样本数据点作为初始的一类的聚类中心,得到K个聚类中心组成的聚类中心集合R;(8b4) Randomly select K sample data points in T', and use each sample data point as an initial cluster center of a class to obtain a cluster center set R composed of K cluster centers;

(8b5)计算T'中每个样本数据点到R中的每个聚类中心的欧式距离,并将各样本数据点分配到与自身欧氏距离最小的聚类中心所属的类别中,计算属于第k个类别的样本数据点的均值作为第k个类别的聚类中心,得到K个类别的聚类中心,实现对R进行更新,其中,k=1,2,…,K;(8b5) Calculate the Euclidean distance from each sample data point in T' to each cluster center in R, and assign each sample data point to the category of the cluster center with the smallest Euclidean distance to itself, and calculate the The mean value of the sample data points of the kth category is used as the clustering center of the kth category, and the clustering centers of the K categories are obtained, and R is updated, where k=1,2,...,K;

(8b6)重复执行步骤(8b5),直到聚类中心集合R不再发生变化为止,得到原始数据集的聚类结果。(8b6) Step (8b5) is repeatedly executed until the cluster center set R no longer changes, and the clustering result of the original data set is obtained.

以下结合仿真实验,对本发明的技术效果作进一步说明。The technical effects of the present invention will be further described below in combination with simulation experiments.

1.仿真条件和内容:1. Simulation conditions and content:

仿真条件:Simulation conditions:

仿真实验中计算机配置环境为Intel(R)Core(i7-7700)3.60GHZ中央处理器、内存32G、WINDOWS7操作系统,计算机仿真软件采用MATLAB R2016b软件。The computer configuration environment in the simulation experiment is Intel(R) Core(i7-7700) 3.60GHZ CPU, memory 32G, WINDOWS7 operating system, and the computer simulation software uses MATLAB R2016b software.

仿真实验分别采用BBCSport数据集、MSRC-v1数据集和Caltech101-7数据集。The simulation experiments use BBCSport data set, MSRC-v1 data set and Caltech101-7 data set respectively.

仿真内容:Simulation content:

仿真1Simulation 1

利用本发明和现有的隐多视图子空间聚类方法在BBCSport数据集下,对聚类的准确率进行对比仿真,其结果如图2所示。Using the present invention and the existing hidden multi-view subspace clustering method in the BBCSport data set, the accuracy of clustering is compared and simulated, and the results are shown in FIG. 2 .

仿真2Simulation 2

利用本发明和现有的隐多视图子空间聚类方法在MSRC-v1数据集下,对聚类的准确率进行对比仿真,其结果如图3所示。Using the present invention and the existing hidden multi-view subspace clustering method under the MSRC-v1 data set, the clustering accuracy rate is compared and simulated, and the results are shown in FIG. 3 .

仿真3Simulation 3

利用本发明和现有的隐多视图子空间聚类方法在Caltech101-7数据集下,对聚类的准确率进行对比仿真,其结果如图4所示。Using the present invention and the existing hidden multi-view subspace clustering method in the Caltech101-7 data set, the accuracy of clustering is compared and simulated, and the results are shown in FIG. 4 .

2.仿真结果分析:2. Simulation result analysis:

参照图2,在BBCSport数据集下,当测试样本的数目分别为60、90、120、150、180、210时,采用本发明对多视图数据集聚类的准确率明显高于采用现有技术对多视图数据集聚类的准确率,并且在测试样本数量为180时,采用本发明对多视图数据集聚类的结果相对采用现有技术对多视图数据集聚类的结果提升的准确率最小,为6.0%。参照图3,在MSRC-v1数据集下,当测试样本的数目分别为28、56、84、112、140、168、196时,采用本发明对多视图数据集聚类的准确率均高于采用现有技术对多视图数据集聚类的准确率,并且在测试样本的数量为28时,采用本发明对多视图数据集聚类的结果相对采用现有技术对多视图数据集聚类的结果提升的准确率最小,为2.0%。参照图4,在Caltech101-7数据集下,当测试样本的数目分别为35、70、105、140、175、210时,采用本发明对多视图数据集聚类的准确率明显高于采用现有技术对多视图数据集聚类的准确率,并且在测试样本的数量为70时,采用本发明对多视图数据集聚类的结果相对采用现有技术对多视图数据集聚类的结果提升的准确率最小,为2.3%。Referring to Figure 2, under the BBCSport data set, when the number of test samples is 60, 90, 120, 150, 180, 210 respectively, the accuracy rate of multi-view data set clustering by the present invention is significantly higher than that of the prior art The accuracy rate of multi-view data set clustering, and when the number of test samples is 180, the accuracy rate of the result of the multi-view data set clustering using the present invention is relative to the result of the multi-view data set clustering using the prior art The smallest, at 6.0%. Referring to Figure 3, under the MSRC-v1 data set, when the number of test samples is 28, 56, 84, 112, 140, 168, and 196 respectively, the accuracy of the multi-view data set clustering using the present invention is higher than The accuracy rate of multi-view data set clustering using the prior art, and when the number of test samples is 28, the result of the multi-view data set clustering using the present invention is relative to that of the multi-view data set clustering using the prior art The resulting improved accuracy is minimal at 2.0%. Referring to Fig. 4, under the Caltech101-7 data set, when the number of test samples is 35, 70, 105, 140, 175, 210 respectively, the accuracy rate of multi-view data set clustering by the present invention is obviously higher than that of the present invention. The accuracy rate of multi-view data set clustering by existing technologies, and when the number of test samples is 70, the result of clustering multi-view data sets using the present invention is improved compared to the result of clustering multi-view data sets using the prior art has the lowest accuracy rate of 2.3%.

由图2-图4的仿真结果,在不同数据集下,采用不同数目的测试数据时,利用本发明对多视图数据集聚类的准确率明显高于现有技术对多视图聚类的准确率,这是因为在进行多视图聚类时,与现有技术相比,本发明以自适应的方式得到多视图数据的一个共同的隐表示,并且利用图正则化使多视图隐表示保持原始数据集中样本的局部几何结构,进而使得原始数据集的相似度矩阵具有更加准确的结构,与现有技术相比,有效提高了对多视图数据集聚类的准确率。From the simulation results of Fig. 2-Fig. 4, under different data sets, when different numbers of test data are used, the accuracy rate of the multi-view data set clustering using the present invention is obviously higher than that of the prior art for multi-view clustering This is because when performing multi-view clustering, compared with the prior art, the present invention obtains a common latent representation of multi-view data in an adaptive manner, and utilizes graph regularization to keep the multi-view latent representation original The local geometric structure of the samples in the data set makes the similarity matrix of the original data set have a more accurate structure. Compared with the existing technology, it effectively improves the accuracy of multi-view data set clustering.

Claims (4)

1. A multi-view subspace clustering method based on implicit representation and self-adaptation is characterized by comprising the following implementation steps:
(1) obtaining a multi-view data matrix of an original data set
Extracting different types of feature data from a plurality of images contained in an original data set respectively, wherein the same feature data form a view matrix, and a plurality of view matrices form multiple views of the original data setData matrixWherein, X(v)The view matrix of the v-th is shown, v is 1,2, …, m represents the number of the view matrix, and m is more than or equal to 2;
(2) computing a multi-view data matrixLaplacian matrix of
(3) Constructing an objective function J of multi-view subspace clustering based on implicit representation and self-adaptation:
(3a) will be provided withDecomposed into a multi-view implicit representation matrix H and a multi-view basis matrixSet up X(v)Base matrix P of(v)With the constraint of O representing P(v)P(v)TIs as follows as IAndthe difference of the product of the sum and H is used as an error reconstruction term Wherein, I represents a unit matrix (.)TRepresents a transpose of a matrix;
(3b) computingMeasure of (2)Is provided withIs adaptive weight ofIs thatThe weight parameter of (a), wherein,represents the square of the norm of the matrix F, gamma represents the adjusting parameter, gamma is more than or equal to 0,
(3c) decomposing the multi-view implicit expression matrix H into a multi-view implicit expression matrix H and a multi-view self-expression coefficient matrix Z, and taking the difference of the product of H and Z as an error reconstruction item Er,ErH-HZ and calculate ErMeasure of (E)r||2,1Let | Er||2,1Has a weight of λ1Wherein | · | purple light2,1A 2,1 norm representing a matrix;
(3d) low-rank constraint term | Z | non-calculation method for constructing multi-view self-representation coefficient matrix Z*Setting | Z | ceiling*Has a weight of λ2Wherein | · | purple light*A kernel norm representing a matrix;
(3e) using Laplace matricesConstructing a multi-view data matrixSimilarity constraint term inIs provided withHas a weight of λ3Where tr (-) represents a trace of the matrix;
(3f) will be provided with||Er||2,1- | Z | | andand (3) carrying out weighted addition to obtain an objective function J based on implicit representation and self-adaptive multi-view subspace clustering:
(4) optimizing an objective function J:
optimizing an objective function J by adopting an alternating direction multiplier method, taking A as an auxiliary matrix variable of Z, setting the constraint of A as that A is Z, and then optimizing the objective function J by adopting the alternating direction multiplier methodAsLagrangian multiplier of (1), Q1As Er(ii) Lagrangian multiplier of H-HZ, Q2Obtaining an optimized objective function J' as a Lagrangian multiplier of A ═ Z:
wherein,< - > represents the inner product of the matrix, mu represents the regularization coefficient;
(5) initializing variables in the optimized objective function J':
to be in JZ、ErQ1、Q2And all elements contained in a are initialized to 0, all elements contained in H are initialized to random numbers between (0,1), and μ is initialized to 0.001;
(6) and performing alternate iteration on the variables in the optimized objective function J':
for the variable H, Z in JErA、Q1、Q2And mu, performing alternate iteration to obtain an iteration update expression H with each variableS、ZS ErSASQ1S、Q2SAnd muS
(7) Calculating the value of the variable Z in the optimized objective function J':
(7a) setting the maximum iteration times of the optimized objective function J';
(7b) iteratively updating the corresponding variables by using the iterative updating expressions of the variables in the J', stopping iteration until the iteration times are equal to the set maximum iteration times, and obtaining an updated multi-view self-expression coefficient matrix
(8) Clustering the original data set:
(8a) calculating a similarity matrix S of the original data set;
(8b) calculating the clustering result of the original data set:
(8b1) diagonalizing a vector t obtained by summing each row of the similarity matrix S to obtain a degree matrix D of S, and calculating a Laplace matrix L,
(8b2) decomposing the eigenvalue of the Laplace matrix L to obtain a matrix T consisting of eigenvectors corresponding to each eigenvalue in the eigenvalue set E and E;
(8b3) arranging the characteristic values in the E according to the sequence from small to large to obtain a characteristic value set E ', and taking the first K characteristic values of the E' to form a set EKAnd selecting and E from TKThe eigenvector corresponding to each eigenvalue in the T 'forms an eigenvector matrix T', and then the normalization result of each row of the T 'is used as a sample data point, wherein K represents the number of types of the sample data point in the T', K is more than or equal to 2 and less than N, and N represents the number of sample data points in the original data set;
(8b4) randomly selecting K sample data points in the T', and taking each sample data point as an initial class of clustering centers to obtain a clustering center set R consisting of the K clustering centers;
(8b5) calculating the Euclidean distance from each sample data point in T' to each clustering center in R, distributing each sample data point to the category to which the clustering center with the minimum Euclidean distance with the sample data point belongs, calculating the mean value of the sample data points belonging to the kth category as the clustering center of the kth category, obtaining the clustering centers of K categories, and realizing the updating of R, wherein K is 1,2, … and K;
(8b6) and repeating the step (8b5) until the cluster center set R is not changed any more, and obtaining the clustering result of the original data set.
2. The hidden-representation-based and adaptive multi-view subspace clustering method according to claim 1, wherein said step (2) of computing a multi-view data matrixLaplacian matrix ofThe method comprises the following implementation steps:
(2a) will be the v view matrix X(v)Each column of (1) is taken as a sample point, and any two sample points are calculatedAndeuropean distance betweeni, j is 1,2, …, N represents the number of sample data points in the original data set, v is 1,2, …, m, m represents the number of view matrixes, and m is more than or equal to 2;
(2b) computing a multi-view data matrixIs associated with the matrix Wherein, W(v)A correlation matrix representing the v-th view matrix,represents W(v)Row i and column j, σ represents the bandwidth parameter of the gaussian kernel;
(2c) for correlation matrixSumming the rows to obtain a vector r, and diagonalizing r to obtain a diagonal multiview matrixAnd computing a multi-view data matrixLaplacian matrix of
3. The hidden representation and adaptive-based multi-view subspace clustering method according to claim 1, wherein the step (6) of performing the alternate iteration on the variables in the optimized objective function J' comprises the following steps:
iteratively updating a multi-view hidden representation matrix H using lyap (P, Q, -T'), wherein lyap (. cndot.) represents the solution of Sylvester's equation,a matrix of multi-view data is represented,a multi-view basis matrix is represented,representing error reconstruction terms, ErRepresenting an error reconstruction term, Z represents a multi-view self-representation coefficient matrix,denotes the Laplace matrix, λ3Representing similarity constraint termsWeight of (2), Q1、Q2Represents the Lagrangian multiplier, I represents the identity matrix, μ represents the regularization coefficient, (. cndot.)TRepresents a transpose of a matrix;
(II) utilization ofIteratively updating a multi-view self-representation coefficient matrix Z, wherein A represents an auxiliary matrix variable of Z;
(III) utilization ofIteratively updating a multi-view basis matrixWhereinAndrepresentation matrixOf left singular vectors and a set of matrices of right singular vectors, wherein,represents a Lagrangian multiplier;
(IV) utilization ofIteratively updating error reconstruction termsWherein,represents an adaptive weight parameter, and gamma represents an adjustment parameter;
(V) utilization ofIteratively updating the error reconstruction term ErColumn j of (a), wherein,B:,jdenotes the jth column, λ, of the matrix B1Representing the error reconstruction term ErMeasure of (E)r||2,1The weight of (| · |) non-calculation2,1A 2,1 norm representing a matrix;
(VI) utilization ofIterative update assistA co-matrix variable A, where U and V represent a matrixLeft and right singular vectors of (d), sigma representing a diagonal matrix of singular values, Sδ(X) ═ max (X- δ,0) + min (X + δ,0) denotes the contraction operator, where max (·, ·) denotes the maximum of the two numbers, min (·,) denotes the minimum of the two numbers, λ2Low-rank constraint term | Z | | non-woven phosphor represented by Z*The weight of (| · |) non-calculation*A kernel norm representing a matrix;
(VII) utilization ofIterative updating
(VIII) by Q1+μ(H-HZ-Er) Iteratively updating Q1Iterative updatingQ2=Q2+ μ (A-Z) iteratively updating Q2And repeatedly updating mu by rho mu, wherein rho represents the adjusting parameter of mu, and rho is larger than or equal to 1.
4. The hidden representation and adaptive-based multi-view subspace clustering method according to claim 1, wherein the similarity matrix S of the original data set is calculated in step (8a) according to the following formula:
wherein | represents a matrix composed of absolute values of each element of the matrix, (·)TWhich represents the transpose of the matrix,and representing the updated multi-view self-representation coefficient matrix.
CN201810801776.1A 2018-07-20 2018-07-20 Based on hidden expression and adaptive multiple view Subspace clustering method Pending CN109002854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810801776.1A CN109002854A (en) 2018-07-20 2018-07-20 Based on hidden expression and adaptive multiple view Subspace clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810801776.1A CN109002854A (en) 2018-07-20 2018-07-20 Based on hidden expression and adaptive multiple view Subspace clustering method

Publications (1)

Publication Number Publication Date
CN109002854A true CN109002854A (en) 2018-12-14

Family

ID=64596652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810801776.1A Pending CN109002854A (en) 2018-07-20 2018-07-20 Based on hidden expression and adaptive multiple view Subspace clustering method

Country Status (1)

Country Link
CN (1) CN109002854A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784374A (en) * 2018-12-21 2019-05-21 西北工业大学 Multi-angle of view clustering method based on adaptive neighbor point
CN109978006A (en) * 2019-02-25 2019-07-05 北京邮电大学 Clustering method and device
CN109993214A (en) * 2019-03-08 2019-07-09 华南理工大学 A Multi-View Clustering Method Based on Laplace Regularization and Rank Constraints
CN110135520A (en) * 2019-05-27 2019-08-16 哈尔滨工业大学(深圳) Incomplete multi-view clustering method, device, system and storage medium based on graph completion and adaptive view weight assignment
CN110543916A (en) * 2019-09-06 2019-12-06 天津大学 A classification method and system for missing multi-view data
CN111401468A (en) * 2020-03-26 2020-07-10 上海海事大学 A Weighted Self-Updating Multi-View Spectral Clustering Method Based on Shared Neighbors
CN111461178A (en) * 2020-03-11 2020-07-28 深圳大学 Data processing method, system and device
CN112035626A (en) * 2020-07-06 2020-12-04 北海淇诚信息科技有限公司 A method, apparatus and electronic device for rapid identification of large-scale intent
CN112148911A (en) * 2020-08-19 2020-12-29 江苏大学 Image clustering method of multi-view intrinsic low-rank structure
CN113139556A (en) * 2021-04-22 2021-07-20 扬州大学 Manifold multi-view image clustering method and system based on self-adaptive composition
CN113159213A (en) * 2021-04-30 2021-07-23 中国工商银行股份有限公司 Service distribution method, device and equipment
CN113239983A (en) * 2021-04-25 2021-08-10 浙江师范大学 Missing multi-view subspace clustering method and system based on high-order association preservation
CN113269203A (en) * 2021-05-17 2021-08-17 电子科技大学 Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition
CN113569973A (en) * 2021-08-04 2021-10-29 咪咕文化科技有限公司 Multi-view clustering method, apparatus, electronic device, and computer-readable storage medium
CN115208631A (en) * 2022-06-15 2022-10-18 华东理工大学 Network intrusion detection system introducing sample geometry and multi-view information
CN115392350A (en) * 2022-08-01 2022-11-25 广东工业大学 Incomplete multi-view clustering method and system based on co-regularization spectral clustering
WO2022267954A1 (en) * 2021-06-24 2022-12-29 浙江师范大学 Spectral clustering method and system based on unified anchor and subspace learning
WO2022267956A1 (en) * 2021-06-24 2022-12-29 浙江师范大学 Multi-view clustering method and system based on matrix decomposition and multi-partition alignment
CN119577488A (en) * 2025-02-08 2025-03-07 鹏城实验室 Clustering method, device, equipment and storage medium for multiple view data

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784374A (en) * 2018-12-21 2019-05-21 西北工业大学 Multi-angle of view clustering method based on adaptive neighbor point
CN109978006A (en) * 2019-02-25 2019-07-05 北京邮电大学 Clustering method and device
CN109978006B (en) * 2019-02-25 2021-02-19 北京邮电大学 Face image clustering method and device
CN109993214A (en) * 2019-03-08 2019-07-09 华南理工大学 A Multi-View Clustering Method Based on Laplace Regularization and Rank Constraints
CN109993214B (en) * 2019-03-08 2021-06-08 华南理工大学 Multi-view clustering method based on Laplace regularization and rank constraint
CN110135520A (en) * 2019-05-27 2019-08-16 哈尔滨工业大学(深圳) Incomplete multi-view clustering method, device, system and storage medium based on graph completion and adaptive view weight assignment
CN110135520B (en) * 2019-05-27 2024-11-15 哈尔滨工业大学(深圳) Incomplete multi-view clustering method, device, system and storage medium based on graph completion and adaptive view weight allocation
CN110543916A (en) * 2019-09-06 2019-12-06 天津大学 A classification method and system for missing multi-view data
CN111461178A (en) * 2020-03-11 2020-07-28 深圳大学 Data processing method, system and device
CN111461178B (en) * 2020-03-11 2023-03-28 深圳大学 Data processing method, system and device
CN111401468B (en) * 2020-03-26 2023-03-24 上海海事大学 Weight self-updating multi-view spectral clustering method based on shared neighbor
CN111401468A (en) * 2020-03-26 2020-07-10 上海海事大学 A Weighted Self-Updating Multi-View Spectral Clustering Method Based on Shared Neighbors
CN112035626A (en) * 2020-07-06 2020-12-04 北海淇诚信息科技有限公司 A method, apparatus and electronic device for rapid identification of large-scale intent
CN112148911A (en) * 2020-08-19 2020-12-29 江苏大学 Image clustering method of multi-view intrinsic low-rank structure
CN112148911B (en) * 2020-08-19 2024-03-19 江苏大学 Image clustering method of multi-view intrinsic low-rank structure
CN113139556B (en) * 2021-04-22 2023-06-23 扬州大学 Manifold multi-view image clustering method and system based on adaptive composition
CN113139556A (en) * 2021-04-22 2021-07-20 扬州大学 Manifold multi-view image clustering method and system based on self-adaptive composition
CN113239983A (en) * 2021-04-25 2021-08-10 浙江师范大学 Missing multi-view subspace clustering method and system based on high-order association preservation
CN113159213A (en) * 2021-04-30 2021-07-23 中国工商银行股份有限公司 Service distribution method, device and equipment
CN113269203A (en) * 2021-05-17 2021-08-17 电子科技大学 Subspace feature extraction method for multi-rotor unmanned aerial vehicle recognition
CN113269203B (en) * 2021-05-17 2022-03-25 电子科技大学 A subspace feature extraction method for multi-rotor UAV identification
WO2022267954A1 (en) * 2021-06-24 2022-12-29 浙江师范大学 Spectral clustering method and system based on unified anchor and subspace learning
WO2022267956A1 (en) * 2021-06-24 2022-12-29 浙江师范大学 Multi-view clustering method and system based on matrix decomposition and multi-partition alignment
CN113569973B (en) * 2021-08-04 2024-04-19 咪咕文化科技有限公司 Multi-view clustering method, device, electronic device and computer-readable storage medium
CN113569973A (en) * 2021-08-04 2021-10-29 咪咕文化科技有限公司 Multi-view clustering method, apparatus, electronic device, and computer-readable storage medium
CN115208631A (en) * 2022-06-15 2022-10-18 华东理工大学 Network intrusion detection system introducing sample geometry and multi-view information
CN115392350A (en) * 2022-08-01 2022-11-25 广东工业大学 Incomplete multi-view clustering method and system based on co-regularization spectral clustering
CN115392350B (en) * 2022-08-01 2025-06-10 广东工业大学 An incomplete multi-view clustering method and system based on co-regularized spectral clustering
CN119577488A (en) * 2025-02-08 2025-03-07 鹏城实验室 Clustering method, device, equipment and storage medium for multiple view data
CN119577488B (en) * 2025-02-08 2025-05-20 鹏城实验室 Clustering method, device, equipment and storage medium for multiple view data

Similar Documents

Publication Publication Date Title
CN109002854A (en) Based on hidden expression and adaptive multiple view Subspace clustering method
WO2022001159A1 (en) Latent low-rank projection learning based unsupervised feature extraction method for hyperspectral image
Zhai et al. Hyperspectral image clustering: Current achievements and future lines
CN108776812A (en) Multiple view clustering method based on Non-negative Matrix Factorization and various-consistency
CN112116017B (en) Image data dimension reduction method based on kernel preservation
CN107563442B (en) Hyperspectral Image Classification Method Based on Sparse Low Rank Regular Graph Quantized Embedding
CN109063757A (en) It is diagonally indicated based on block and the multifarious multiple view Subspace clustering method of view
CN107292341B (en) An adaptive multi-view clustering method based on pairwise co-regularization and NMF
De Backer et al. Non-linear dimensionality reduction techniques for unsupervised feature extraction
CN109522956A (en) A kind of low-rank differentiation proper subspace learning method
CN112861929B (en) An Image Classification Method Based on Semi-Supervised Weighted Transfer Discriminant Analysis
CN107451545B (en) Face recognition method based on multi-channel discriminative non-negative matrix factorization under soft labels
CN111310666A (en) High-resolution image ground feature identification and segmentation method based on texture features
CN104680179B (en) Method of Data with Adding Windows based on neighborhood similarity
CN109359525B (en) Polarized SAR image classification method based on sparse low-rank discrimination spectral clustering
CN109543723B (en) Robust image clustering method
CN115564996A (en) A Hyperspectral Remote Sensing Image Classification Method Based on Joint Attention Network
CN110020599A (en) A kind of facial image clustering method of sparse enhanced type low-rank constraint
CN109241813B (en) Non-constrained face image dimension reduction method based on discrimination sparse preservation embedding
CN109784360B (en) Image clustering method based on depth multi-view subspace ensemble learning
CN111027582B (en) Semi-supervised feature subspace learning method and device based on low-rank graph learning
CN114037931B (en) A multi-view discrimination method with adaptive weights
CN110364264A (en) Feature Dimensionality Reduction Method for Medical Datasets Based on Subspace Learning
CN109766385A (en) Multi-view projection clustering method based on self-learning weights
CN113554082B (en) Multi-view subspace clustering method for self-weighted fusion of local and global information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181214