WO2022166361A1 - 一种基于跨模态融合的深度聚类方法及系统 - Google Patents

一种基于跨模态融合的深度聚类方法及系统 Download PDF

Info

Publication number
WO2022166361A1
WO2022166361A1 PCT/CN2021/135894 CN2021135894W WO2022166361A1 WO 2022166361 A1 WO2022166361 A1 WO 2022166361A1 CN 2021135894 W CN2021135894 W CN 2021135894W WO 2022166361 A1 WO2022166361 A1 WO 2022166361A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
encoder
modal
autoencoder
information
Prior art date
Application number
PCT/CN2021/135894
Other languages
English (en)
French (fr)
Inventor
朱信忠
徐慧英
涂文轩
赵建民
Original Assignee
浙江师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江师范大学 filed Critical 浙江师范大学
Publication of WO2022166361A1 publication Critical patent/WO2022166361A1/zh
Priority to ZA2023/08290A priority Critical patent/ZA202308290B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the technical field of unsupervised deep clustering, and in particular, to a deep clustering method and system based on cross-modal fusion.
  • Deep clustering aims to train a neural network in an unsupervised manner to learn discriminative feature representations to partition data into disjoint subsets.
  • two key factors the optimization objective and the way of feature extraction largely determine the performance of the clustering method.
  • the current deep clustering methods have the following two problems: 1) There is a lack of a cross-modal dynamic information fusion and processing mechanism, and the simple fusion or splicing of the information of the two modalities will lead to insufficient information interaction; 2) The existing work The generation process of the target distribution in the medium hardly considers the utilization of the information of the two modalities, which makes the training of the network less comprehensive and accurate. Therefore, the interaction between the structural information and attribute information of the data is hindered, so that the performance of the deep clustering method cannot be improved.
  • the purpose of this application is to provide a deep clustering method and system based on cross-modal fusion in view of the defects of the prior art.
  • a deep clustering system based on cross-modal fusion including self-encoder, graph self-encoder, cross-modal information fusion module and joint optimization target module;
  • the graph self-encoder is connected with the self-encoder, and the cross-modal information fusion module
  • the modal information fusion module is respectively connected with the self-encoder and the graph self-encoder;
  • the joint optimization target module is respectively connected with the self-encoder, the graph self-encoder and the cross-modal information fusion module;
  • Autoencoder is used to extract the attribute information of the graph data and reconstruct the original attribute matrix
  • the graph autoencoder is used to extract features from the structural information of graph data and reconstruct the original adjacency matrix and weighted attribute matrix;
  • the cross-modal information fusion module is used to integrate the modal information of the auto-encoder and the modal information of the graph auto-encoder to generate a consensus latent embedding, and initialize the cluster centers according to the consensus latent embedding and pre-computation to generate a soft assignment distribution and target distribution;
  • the joint optimization objective module is used to simultaneously guide the parameter update process of the autoencoder, the graph autoencoder, and the cross-modal information fusion module.
  • the graph self-encoder performs feature extraction on the structural information of graph data and reconstructs the original adjacency matrix and weighted attribute matrix, specifically:
  • Z (l) represents the output embedding of the lth coding layer; represents the output embedding of the hth decoding layer; W (l) and represent the learnable parameter matrices of the lth encoder layer and the hth decoder layer, respectively;
  • represents the nonlinear activation function; represents the normalized original adjacency matrix; represents the output embedding of the (h-1)th decoding layer;
  • Z (l-1) represents the output embedding of the (l-1)th encoding layer.
  • N represents the number of samples
  • d represents the attribute dimension
  • Lw represents the reconstruction loss of the weighted attribute matrix
  • L a represents the reconstruction loss of the adjacency matrix
  • the cross-modal information fusion module includes a cross-modal dynamic fusion mechanism and a triplet self-supervision strategy
  • the cross-modal dynamic fusion mechanism is used to deeply interact with the modal information of the auto-encoder and the modal information of the graph auto-encoder to generate a consensus latent embedding;
  • a triplet self-supervised strategy is used to initialize cluster centers based on consensus latent embeddings and precomputing to generate soft assignment distributions as well as target distributions.
  • cross-modal dynamic fusion mechanism specifically includes:
  • the combination module is used to linearly combine the latent embeddings of the autoencoder and the graph autoencoder to obtain the initial fusion embedding information, which is expressed as:
  • d' represents the latent embedding dimension
  • represents the learnable coefficient matrix
  • Z AE represents the self-encoder
  • Z IGAE represents the graph self-encoder
  • Z I ⁇ R N ⁇ d' represents the initialization fusion embedding information
  • d' represents the The dimension of the latent embedding.
  • the processing module is used to enhance the initialization fusion embedding information based on the operation of graph convolution, which is expressed as:
  • Z L ⁇ R N ⁇ d' represents the latent embedding after local structure enhancement
  • the reorganization module is used to reorganize the initialization fusion embedding information based on the autocorrelation learning mechanism, which is expressed as:
  • Z G represents the information after recombining Z L ;
  • S represents the autocorrelation matrix;
  • the conduction module is used to conduct information in the fusion mechanism based on the skip connection, which is expressed as:
  • represents the scale parameter
  • the triplet self-supervised strategy generates soft allocation distribution and target distribution, which are expressed as:
  • u j represents the jth precomputed cluster center
  • v represents the degrees of freedom of the Student's T-distribution
  • q ij represents the assignment of the ith sample to the jth center Probability, that is, soft distribution distribution
  • p ij represents the probability that the i-th sample belongs to the j-th cluster center, that is, the target distribution
  • j' represents the j'-th cluster center.
  • the target distribution is generated from the triple self-supervised strategy, it further includes:
  • the representation power of each part is improved by the triplet clustering loss, where the triplet clustering loss is expressed as:
  • the parameter update process of the synchronous guidance self-encoder, the graph self-encoder, and the cross-modal information fusion module in the joint optimization target module specifically includes:
  • L AE represents the mean square error reconstruction loss of the autoencoder
  • represents a predefined hyperparameter
  • a deep clustering method based on cross-modal fusion including:
  • the autoencoder performs feature extraction on the attribute information of the graph data and reconstructs the original attribute matrix
  • the graph self-encoder performs feature extraction on the structural information of the graph data and reconstructs the original adjacency matrix and the weighted attribute matrix
  • the cross-modal information fusion module integrates the modal information of the auto-encoder and the modal information of the graph auto-encoder to generate a consensus latent embedding, and initializes the cluster centers according to the consensus latent embedding and pre-computation to generate a soft allocation distribution and target distribution;
  • the joint optimization objective module synchronously guides the parameter update process of the autoencoder, the graph autoencoder, and the cross-modal information fusion module.
  • step S2 feature extraction is performed on the structural information of the graph data and the original adjacency matrix and the weighted attribute matrix are reconstructed, specifically:
  • Z (l) represents the output embedding of the lth coding layer; represents the output embedding of the hth decoding layer; W (l) and represent the learnable parameter matrices of the lth encoder layer and the hth decoder layer, respectively;
  • represents the nonlinear activation function; represents the normalized original adjacency matrix; represents the output embedding of the (h-1)th decoding layer;
  • Z (l-1) represents the output embedding of the (l-1)th encoding layer.
  • N represents the number of samples
  • d represents the attribute dimension
  • Lw represents the reconstruction loss of the weighted attribute matrix
  • L a represents the reconstruction loss of the adjacency matrix
  • step S3 specifically includes:
  • the present application proposes a novel deep clustering method and system based on cross-modal information fusion.
  • the method includes an auto-encoder module, a graph self-encoder module, a cross-modal information fusion module and a Joint optimization objective.
  • a large number of ablation experiments show that the full fusion of structural information and attribute information in this application helps to encode more compact and more discriminative information, which in turn can generate more robust target distributions and provide more accurate guidance for network learning.
  • Experimental results on six public datasets demonstrate that the present application outperforms existing methods.
  • Embodiment 1 is a structural diagram of a deep clustering system based on cross-modal fusion provided by Embodiment 1;
  • FIG. 2 is a schematic structural diagram of a cross-modal information fusion module provided in the second embodiment.
  • the present application provides a deep clustering method and system based on cross-modal fusion.
  • the core idea is to fully extract the node attribute information of the self-encoder and the structured information of the graph self-encoder, and design a dynamic information fusion module to combine the two to achieve an accurate representation reconstruction process.
  • the present application carefully designs a structure and attribute information fusion module.
  • First, two types of embedding features are fused from local and global levels to obtain consensus representation information.
  • the pre-cluster centers are calculated by evaluating the similarity between samples and the Student's T-distribution, and the soft assignment distribution Q and the target distribution P are obtained.
  • a ternary self-supervision mechanism which utilizes the target distribution to simultaneously provide learning guidance for the autoencoder, graph autoencoder and information fusion part.
  • the deep fusion clustering network also includes an improved graph autoencoder whose structure is symmetric and the adjacency matrix is reconstructed synchronously through latent variables and decoder output variables. The present application not only solves the problem of insufficient interaction of multi-source information in the current deep clustering method, but also solves the problem that the target distribution is not robust enough in the deep clustering method based on self-optimization.
  • a deep clustering system based on cross-modal fusion includes an autoencoder 11, a graph autoencoder 12, a cross-modal information fusion module 13 and a joint optimization target module; the graph autoencoder 11 and the autoencoder
  • the encoder 12 is connected, and the cross-modal information fusion module 13 is respectively connected with the self-encoder 11 and the graph self-encoder 12; the joint optimization target module is respectively connected with the self-encoder 11, the graph self-encoder 12, and the cross-modal information fusion module 13 connect.
  • the self-encoder 11 is used to perform feature extraction on the attribute information of the graph data and reconstruct the original attribute matrix
  • the graph self-encoder 12 is used to perform feature extraction on the structural information of the graph data and reconstruct the original adjacency matrix and the weighted attribute matrix;
  • the cross-modal information fusion module 13 is used to integrate the modal information of the self-encoder and the modal information of the graph self-encoder to generate a consensus latent embedding, and initialize the cluster centers according to the consensus latent embedding and pre-calculation, and generate a soft distribution distribution and target distribution;
  • the joint optimization objective module is used to simultaneously guide the parameter update process of the autoencoder, the graph autoencoder, and the cross-modal information fusion module.
  • the autoencoder in this embodiment is a fusion-based autoencoder, while most existing generative encoders, whether it is an autoencoder or a graph autoencoder, only use their own latent embeddings to reconstruct the input.
  • this embodiment proposes a compact representation based on autoencoders and graph autoencoders. Specifically, the two modal information from the autoencoder and the graph autoencoder are first integrated to generate a consensus latent embedding form. Then, the embeddings of the autoencoder and the graph autoencoder are used as consensus inputs to reconstruct the inputs of the two subnetworks. Different from the existing methods, the method proposed in this embodiment utilizes a well-designed fusion module to fuse the structural information and attribute information, and then reconstructs the inputs of the two subnetworks using the consensus latent embedding.
  • the structure of autoencoders is usually symmetric, while the structure of graph autoencoders is usually asymmetrical.
  • Graph autoencoders only utilize latent embeddings to reconstruct the adjacency matrix, which ignores the property that structure-based attribute information can be used to improve the generalization ability of the network.
  • this embodiment designs an improved graph auto-encoder (Improved Graph Auto-encoder, IGAE).
  • IGAE improved graph auto-encoder
  • the network needs to simultaneously reconstruct the weighted attribute matrix and the adjacency matrix, and its encoder and decoder are formally expressed as:
  • Z (l) represents the output embedding of the lth coding layer; represents the output embedding of the hth decoding layer; W (l) and represent the learnable parameter matrices of the lth encoder layer and the hth decoder layer, respectively;
  • represents the nonlinear activation function; represents the normalized original adjacency matrix; represents the output embedding of the (h-1)th decoding layer;
  • Z (l-1) represents the output embedding of the (l-1)th encoding layer.
  • N represents the number of samples
  • d represents the attribute dimension
  • Lw represents the reconstruction loss of the weighted attribute matrix
  • L a represents the reconstruction loss of the adjacency matrix
  • the proposed improved graph autoencoder simultaneously minimizes the reconstruction loss of the weighted attribute matrix and the adjacency matrix.
  • the modal information of the auto-encoder and the modal information of the graph auto-encoder are integrated to generate a consensus latent embedding, and the cluster centers are initialized according to the consensus latent embedding and pre-computation to generate soft Distribution distribution and target distribution.
  • this embodiment proposes a structure and attribute information fusion module.
  • the module consists of two parts, namely, a cross-modal dynamic fusion mechanism and a triplet self-supervision strategy.
  • the cross-modal dynamic fusion mechanism starting from the local and global levels, completes the deep interaction of the latent embedding information of the two modalities and generates a more compact consensus latent embedding;
  • the ternary self-supervised strategy generates a more accurate soft allocation distribution Q and a more robust target distribution P on the basis of consensus implicit embedding and precomputing initialized cluster centers.
  • the cross-modal dynamic fusion mechanism proposed in this embodiment mainly includes four steps, including:
  • the combination module is used to linearly combine the latent embeddings of the autoencoder and the graph autoencoder to obtain the initialization fusion embedding information;
  • d' represents the latent embedding dimension
  • represents the learnable coefficient matrix that selectively evaluates the importance of two modal information according to the properties of different datasets
  • Z AE represents the autoencoder
  • Z IGAE represents Graph autoencoder
  • Z I ⁇ R N ⁇ d′ represents the initial fusion embedding information
  • d′ represents the dimension of the latent embedding.
  • is initialized to 0.5, and is automatically adjusted by the stochastic gradient descent method.
  • the processing module is used to enhance the initialization fusion embedding information based on the operation of graph convolution
  • Z L ⁇ R N ⁇ d' represents the latent embedding after local structure enhancement.
  • the reorganization module is used to reorganize the initialization fusion embedding information based on the autocorrelation learning mechanism
  • An autocorrelation learning mechanism is introduced to model non-local relations in the initial information fusion space.
  • the normalized autocorrelation matrix is first calculated by the following formula, which is expressed as:
  • Z_L is reorganized by calculating the global correlation between samples, which is expressed as:
  • Z G represents the information after recombining Z L ;
  • S represents the autocorrelation matrix.
  • the conduction module is used to conduct information in the fusion mechanism based on the skip connection;
  • represents the scale parameter, which is initialized to 0 and makes its weight gradient differentiable when training the network; Represents fused clustering embeddings.
  • the cross-modal dynamic fusion mechanism considers the correlation of samples from both local and global aspects. Therefore, this module helps to fuse and proofread the information of autoencoders and graph autoencoders to learn higher quality consensus latent embeddings.
  • this application uses the clustering embedding generated by the fusion of autoencoder and graph autoencoder. to generate the target distribution.
  • the soft allocation distribution and the target distribution are generated from the triple self-supervised strategy, which are expressed as:
  • u j represents the jth precomputed cluster center
  • v represents the degrees of freedom of the Student's T-distribution
  • q ij represents the assignment of the ith sample to the jth center Probability, that is, soft distribution distribution
  • p ij represents the probability that the i-th sample belongs to the j-th cluster center, that is, the target distribution
  • j' represents the j'-th cluster center.
  • the ith sample in the fusion embedding space is calculated using the Student's T-distribution as the base kernel and the jth precomputed cluster center (u j ).
  • the soft assignment matrix Q ⁇ R N ⁇ K reflects the probability distribution of all samples.
  • formula (11) is introduced to guide all samples to approach the cluster center.
  • 0 ⁇ p ij ⁇ 1 is an element of the generated target distribution P ⁇ R N ⁇ K , representing the probability that the ith sample belongs to the jth cluster center.
  • the soft assignment distribution of the latent embedding of the autoencoder and the improved graph autoencoder is calculated according to Eq. (10).
  • the soft assignment distributions of the autoencoder and the improved graph autoencoder are denoted as Q' and Q".
  • this embodiment designs a triple clustering loss, which is expressed as:
  • the soft-assignment distributions and fused embeddings of the autoencoder and the improved graph autoencoder are simultaneously aligned with the robust target distribution. Since the target distribution is generated in an unsupervised mode, the loss function is called triple clustering loss in this application, and its corresponding training mechanism is called triple self-supervised strategy.
  • the parameter update process of the autoencoder, graph autoencoder, and cross-modal information fusion module is guided synchronously.
  • the learning objectives of the model mainly include two parts:
  • L AE denotes the mean square error (MSE) reconstruction loss of the autoencoder
  • denotes a predefined hyperparameter
  • the deep fusion clustering network proposed in this application reconstructs the inputs of the two sub-networks with a consensus latent embedding.
  • is a predefined hyperparameter that balances the importance of reconstruction and clustering.
  • this embodiment has the following beneficial effects:
  • This embodiment provides a structure and attribute information fusion module, which is used to enhance the interaction between attribute information and structure information.
  • the autoencoder and the graph autoencoder use consensus latent embedding to reconstruct the original input, which helps to improve the generalization ability of the latent embedding;
  • the ternary self-supervised learning mechanism unifies the autoencoder, graph autoencoder and fusion part into the same optimization framework, thereby improving the quality of latent embedding and the performance of clustering.
  • This embodiment provides an improved graph self-encoder, which overcomes the limitation of existing encoding methods that only reconstruct structural information, and improves the generalization of the clustering framework by jointly reconstructing structural information and weighted attribute information. ability.
  • the purpose of this embodiment is to provide a A deep clustering system based on cross-modal information fusion, which fuses the sample nodes of the two modalities locally and globally. Subsequently, the soft-assignment distribution Q and the target distribution P are computed in the fused embedding space by evaluating the similarity between the samples and the pre-cluster centers computed from the Student's T-distribution.
  • the adjacency matrix, attribute matrix, and attribute matrix weighted by local information are reconstructed, and the fusion part is optimized to train an end-to-end deep neural framework.
  • the K-means clustering algorithm is used to perform clustering in the weighted fused embedding space for the purpose of clustering unsupervised depth map information.
  • Embodiment 1 The difference between a deep clustering system based on cross-modal fusion provided in this embodiment and Embodiment 1 is:
  • This embodiment is compared with existing methods on multiple data sets to verify the effectiveness of the present application.
  • HHAR [Lewis, D.D.; Yang, Y.; Rose, T.G.; and Li, F.2004.RCV1:A New Benchmark Collection for Text Categorization Research.Journal of Machine Learning Research 5(2):361–397];
  • USPS This dataset is of image type and contains 9298 single-channel images of size 16 ⁇ 16, uniformly distributed in 10 categories.
  • HHAR This dataset belongs to the text type and contains 10,299 pieces of text data. Each piece of data has 561-dimensional features and is evenly distributed in 6 categories.
  • REUT The dataset is of text type and contains 10,000 pieces of text data. Each piece of data has 10,000-dimensional features and is evenly distributed in 3 categories.
  • ACM This dataset belongs to the graph type and contains 3025 graph nodes, each with 10000-dimensional features, evenly distributed in 4 categories.
  • DBLP This dataset belongs to the graph type and contains 4058 graph nodes, each with 334-dimensional features, evenly distributed in 4 categories.
  • REUT This dataset belongs to the graph type and contains 3327 graph nodes, each with 3703-dimensional features, evenly distributed in 6 categories.
  • the implementation environment of this embodiment is the Pytorch platform, and the training method includes the following four steps in total.
  • Adam optimization is uniformly used to optimize the model.
  • the model learning rate is set to 0.001 on the USPS and HHAR datasets, 0.0001 on the REUT, DBLP and CITE datasets, and 0.00005 on the ACM dataset.
  • the training batch parameter is set to 256, and an early stopping strategy is adopted to avoid model overfitting.
  • the two balance factors ⁇ and ⁇ are set to 0.1 and 10, respectively.
  • the number of neighbors of each sample is set to 5.
  • This embodiment adopts four evaluation indicators recognized in the field of deep clustering algorithms: Clustering Accuracy (ACC), Regularized Mutual Information (NMI), Average Rand Index (ARI) and F1 Score.
  • ACC Clustering Accuracy
  • NMI Regularized Mutual Information
  • ARI Average Rand Index
  • F1 Score F1 Score.
  • the Hungarian algorithm Kuhn-Munkres [Lovász, L.; and Plummer, M. 1986. Matching Theory] was used to match the relationship between the cluster ID and the class ID of each sample.
  • Comparison experiments are conducted on 6 kinds of multi-type data sets and 10 kinds of benchmark algorithms. Comparison methods include K-means algorithm, autoencoder, deep embedding clustering method, improved deep embedding clustering method, graph autoencoder, graph variational autoencoder, adversarial regular graph autoencoder, deep attention graph embedding clustering method and structured deep clustering method.
  • this embodiment fully integrates the attribute information and structural information of the original data, and performs complementary learning consensus embedding representation for the two modal information, thereby improving the quality of the latent embedding and the clustering effect; 2) the existing Graph convolution-based clustering methods, such as graph autoencoder, graph variational autoencoder, adversarial regular graph autoencoder and deep attention graph embedding clustering method, do not fully mine the attribute information of the data itself, and there is a Over-smoothing caused by continuous aggregation.
  • This embodiment integrates the attribute-based representation in the self-encoding into a unified clustering framework, and performs interactive learning consensus embedding on the graph structure and node attributes through the fusion module, thereby improving the clustering performance; 3) Compared with the most advanced For the two clustering methods, the structured deep clustering method and its variants, the present application achieves an overall improvement in performance on six data sets. Taking the DBLP dataset as an example, the performance of this application is significantly better than SDCN and SDCN-Q, and the four indicators of accuracy, mean mutual information, average Rand index (ARI) and F1 score are improved by 7.9%, 4.2%, 7.8% and 8.0%.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请公开了一种基于跨模态融合的深度聚类系统,包括自编码器、图自编码器、跨模态信息融合模块和联合优化目标模块;自编码器,用于对图数据的属性信息进行特征提取并重构原始属性矩阵;图自编码器,用于对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵;跨模态信息融合模块,用于将自编码器的模态信息与图自编码器的模态信息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布和目标分布;联合优化目标模块,用于同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。

Description

一种基于跨模态融合的深度聚类方法及系统 技术领域
本申请涉及无监督深度聚类技术领域,尤其涉及一种基于跨模态融合的深度聚类方法及系统。
背景技术
深度聚类,旨在以无监督的方式训练一个神经网络以学习具有判别能力的特征表示,从而将数据划分为若干个不相交的子集。在深度聚类方法中,两个关键因素:优化目标和特征提取的方式很大程度上决定了聚类方法的性能。
然而,当前深度聚类方法存在以下两点问题:1)缺乏一种跨模态动态信息融合与处理机制,两种模态的信息进行简单融合或拼接会导致信息交互不足;2)现有工作中目标分布的生成过程几乎没有考虑利用两种模态的信息,这使得网络的训练不够全面和准确。因此,数据的结构信息与属性信息的交互存在阻碍,从而无法提高深度聚类方法的性能。
发明内容
本申请的目的是针对现有技术的缺陷,提供了一种基于跨模态融合的深度聚类方法及系统。
为了实现以上目的,本申请采用以下技术方案:
一种基于跨模态融合的深度聚类系统,包括自编码器、图自编码器、跨模态信息融合模块和联合优化目标模块;所述图自编码器与自编码器连接,所述跨模态信息融合模块分别与自编码器和图自编码器连接;所述联合优化目标模块分别与自编码器、图自编码器、跨模态信息融合模块连接;
自编码器,用于对图数据的属性信息进行特征提取并重构原始属性矩阵;
图自编码器,用于对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵;
跨模态信息融合模块,用于将自编码器的模态信息与图自编码器的模态信 息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布和目标分布;
联合优化目标模块,用于同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。
进一步的,所述图自编码器中对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵,具体为:
图自编码器中的编码器和解码器的形式,表示为:
Figure PCTCN2021135894-appb-000001
Figure PCTCN2021135894-appb-000002
其中,Z (l)表示第l个编码层的输出嵌入;
Figure PCTCN2021135894-appb-000003
表示第h个解码层的输出嵌入;W (l)
Figure PCTCN2021135894-appb-000004
分别表示第l个编码器层和第h个解码器层的可学参数矩阵;σ表示非线性激活函数;
Figure PCTCN2021135894-appb-000005
表示归一化后的原始邻接矩阵;
Figure PCTCN2021135894-appb-000006
表示第(h-1)个解码层的输出嵌入;Z (l-1)表示第(l-1)个编码层的输出嵌入。
图自编码器的最小化混合损失函数L IGAE,表示为:
L IGAE=L w+γL a         (3)
其中,γ表示预定义的超参数,用于平衡两个重建损失函数的权重;L w和L a表示为:
Figure PCTCN2021135894-appb-000007
Figure PCTCN2021135894-appb-000008
其中,
Figure PCTCN2021135894-appb-000009
表示重构的加权属性矩阵;
Figure PCTCN2021135894-appb-000010
表示通过内积运算生成的重构原始邻接矩阵;N表示样本数量;d表示属性维度;L w表示加权属性矩阵的重构损失;L a表示邻接矩阵的重构损失。
进一步的,所述跨模态信息融合模块包括跨模态动态融合机制和三元组自监督策略;
跨模态动态融合机制,用于将自编码器的模态信息与图自编码器的模态信息进行隐嵌入信息的深度交互,生成共识隐嵌入;
三元组自监督策略,用于根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布以及目标分布。
进一步的,所述跨模态动态融合机制具体包括:
组合模块,用于将自编码器和图自编码器的隐嵌入进行线性组合,得到初始化融合嵌入信息,表示为:
Z I=αZ AE+(1-α)Z IGAE        (6)
其中,d'表示隐嵌入维数;α表示可学的系数矩阵;Z AE表示自编码器;Z IGAE表示图自编码器;Z I∈R N×d′表示初始化融合嵌入信息;d′表示隐嵌入的维度。
处理模块,用于基于图卷积的操作增强初始化融合嵌入信息,表示为:
Figure PCTCN2021135894-appb-000011
其中,Z L∈R N×d‘表示局部结构增强后的隐嵌入;
重组模块,用于基于自相关学习机制,对初始化融合嵌入信息进行重组,表示为:
Z G=SZ L        (8)
其中,Z G表示对Z L进行重组后的信息;S表示自相关矩阵;
传导模块,用于基于跳跃连接方式使信息在融合机制中传导,表示为:
Figure PCTCN2021135894-appb-000012
其中,β表示尺度参数;
Figure PCTCN2021135894-appb-000013
表示融合聚类嵌入。
进一步的,所述三元组自监督策略中生成软分配分布以及目标分布,表示为:
Figure PCTCN2021135894-appb-000014
Figure PCTCN2021135894-appb-000015
其中,
Figure PCTCN2021135894-appb-000016
表示融合聚类嵌入中第i个样本;u j表示第j个预先计算的聚类中心;v表示学生T-分布的自由度;且q ij表示将第i个样本分配给第j个中心的概率,即软分配分布;p ij表示第i个样本属于第j个聚类中心的概率,即目标分布;j′表示第j′个聚类中心。
进一步的,所述三元组自监督策略中生成目标分布后还包括:
通过三元组聚类损失提高每个部分的表示能力,其中三元组聚类损失表示 为:
Figure PCTCN2021135894-appb-000017
其中,L KL表示三元组聚类损失。
进一步的,所述联合优化目标模块中同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程具体包括:
计算自编码器和图自编码器的重构损失,计算自编码器、图自编码器与目标分布相关的聚类损失,表示为:
L=L AE+L IGAE+λL KL         (13)
其中,L AE表示自编码器的均方误差重建损失;λ表示是预定义的超参数。
相应的,还提供一种基于跨模态融合的深度聚类方法,包括:
S1.自编码器对图数据的属性信息进行特征提取并重构原始属性矩阵;
S2.图自编码器对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵;
S3.跨模态信息融合模块将自编码器的模态信息与图自编码器的模态信息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布和目标分布;
S4.联合优化目标模块同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。
进一步的,所述步骤S2中对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵,具体为:
图自编码器中的编码器和解码器的形式,表示为:
Figure PCTCN2021135894-appb-000018
Figure PCTCN2021135894-appb-000019
其中,Z (l)表示第l个编码层的输出嵌入;
Figure PCTCN2021135894-appb-000020
表示第h个解码层的输出嵌入;W (l)
Figure PCTCN2021135894-appb-000021
分别表示第l个编码器层和第h个解码器层的可学参数矩阵;σ表示非线性激活函数;
Figure PCTCN2021135894-appb-000022
表示归一化后的原始邻接矩阵;
Figure PCTCN2021135894-appb-000023
表示第(h-1)个解码层的输出嵌入;Z (l-1)表示第(l-1)个编码层的输出嵌入。
图自编码器的最小化混合损失函数L IGAE,表示为:
L IGAE=L w+γL a        (3)
其中,γ表示预定义的超参数,用于平衡两个重建损失函数的权重;L w和L a表示为:
Figure PCTCN2021135894-appb-000024
Figure PCTCN2021135894-appb-000025
其中,
Figure PCTCN2021135894-appb-000026
表示重构的加权属性矩阵;
Figure PCTCN2021135894-appb-000027
表示通过内积运算生成的重构原始邻接矩阵;N表示样本数量;d表示属性维度;L w表示加权属性矩阵的重构损失;L a表示邻接矩阵的重构损失。
进一步的,所述步骤S3具体包括:
S31.将自编码器的模态信息与图自编码器的模态信息进行隐嵌入信息的深度交互,生成共识隐嵌入;
S32.根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布以及目标分布。
与现有技术相比,本申请提出了一种新颖的基于跨模态信息融合的深度聚类方法及系统,该方法包括自编码器模块、图自编码器模块、跨模态信息融合模块和联合优化目标。大量的消融实验表明,本申请充分融合结构信息与属性信息有助于编码更紧凑和更具有判别性的信息,进而能够生成更加鲁棒的目标分布,为网络的学习提供更精准的指导。在六个公共数据集上的实验结果证明本申请的性能优于现有方法的性能。
附图说明
图1是实施例一提供的一种基于跨模态融合的深度聚类系统结构图;
图2是实施例二提供的跨模态信息融合模块结构示意图。
具体实施方式
以下通过特定的具体实例说明本申请的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本申请的精神下进行各种修饰或改变。需说明 的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
本申请针对现有缺陷,提供了一种基于跨模态融合的深度聚类方法及系统。核心思想是:充分提取自编码器的节点属性信息与图自编码器的结构化信息,并设计一种动态信息融合模块将两者结合以实现精准的表征重构过程。具体而言,本申请精心设计了一种结构与属性信息融合模块。首先,从局部和全局层面融合两种类型的嵌入特征以获取共识表征信息。其次,通过评估样本之间的相似度以及学生T-分布计算出预聚类中心,获得软分配分布Q和目标分布P。最后,设计一种三元自监督机制,该机制利用目标分布同时为自编码器、图自编码器与信息融合部分提供学习指导。此外,深度融合聚类网络还包括一种改进的图自编码器,该自编码器的结构是对称的且通过隐变量与解码器输出变量来同步重构邻接矩阵。本申请既解决了当前深度聚类方法存在多源信息交互不足的问题,也解决了基于自优化的深度聚类方法存在的目标分布不够鲁棒的问题。
实施例一
本实施例提供的一种基于跨模态融合的深度聚类系统,包括自编码器11、图自编码器12、跨模态信息融合模块13和联合优化目标模块;图自编码器11与自编码器12连接,跨模态信息融合模块13分别与自编码器11和图自编码器12连接;联合优化目标模块分别与自编码器11、图自编码器12、跨模态信息融合模块13连接。
自编码器11,用于对图数据的属性信息进行特征提取并重构原始属性矩阵;
图自编码器12,用于对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵;
跨模态信息融合模块13,用于将自编码器的模态信息与图自编码器的模态信息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布和目标分布;
联合优化目标模块,用于同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。
在本实施例中,假设存在一个无向图G={V,E},具有K个聚类中心, V={v1,v2,...,v N}和E分别是节点集和边集,N表示样本数量。图的特征通过其属性矩阵X∈R N×d和原始邻接矩阵A=(a ij) N×N∈R N×N表示,d表示属性维度,且当(v i,v j)∈E,a ij=1,否则a ij=0。
无向图G的度矩阵为D=diag(d 1,d 2,...,d N)∈R N×N且d i=∑ vj∈V aij,原始邻接矩阵通过计算
Figure PCTCN2021135894-appb-000028
被表示为归一化的形式
Figure PCTCN2021135894-appb-000029
其中I∈R N×N,表明V中每个节点都与自环结构连接。
在自编码器11中,对图数据的属性信息进行特征提取并重构原始属性矩阵。
本实施例的自编码器为基于融合的自动编码器,而现有大多数生成式编码器中,无论是自编码器还是图自编码器,都仅利用其自身的隐嵌入来重构输入。相比之下,本实施例提出一种基于自编码器和图自编码器的紧凑表示法。具体而言,首先将来自自编码器和图自编码器的两种模态信息进行整合以生成共识的隐嵌入形式。然后,将自编码器和图自编码器的嵌入作为共识输入来重构两个子网的输入。与现有方法不同,本实施例提出的方法利用精心设计的融合模块将结构信息和属性信息进行融合,然后使用共识的隐嵌入重构两个子网的输入。
在图自编码器12中,对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵。
自编码器的结构通常是对称的,而图自编码器的结构通常是非对称的。图自编码器只利用隐嵌入来重建邻接矩阵,这忽略了基于结构的属性信息可以用于提高网络泛化能力的性质。为了更好地利用邻接信息和属性信息,本实施例设计了一种改进的图自编码器(Improved Graph Auto-encoder,IGAE)。该网络需要同时重构加权属性矩阵和邻接矩阵,其编码器和解码器的形式化表示为:
Figure PCTCN2021135894-appb-000030
Figure PCTCN2021135894-appb-000031
其中,Z (l)表示第l个编码层的输出嵌入;
Figure PCTCN2021135894-appb-000032
表示第h个解码层的输出嵌入;W (l)
Figure PCTCN2021135894-appb-000033
分别表示第l个编码器层和第h个解码器层的可学参数矩阵;σ表示非线性激活函数;
Figure PCTCN2021135894-appb-000034
表示归一化后的原始邻接矩阵;
Figure PCTCN2021135894-appb-000035
表示第(h-1)个解码层的输出嵌入;Z (l-1)表示第(l-1)个编码层的输出嵌入。
图自编码器的最小化混合损失函数L IGAE,表示为:
L IGAE=L w+γL a        (3)
其中,γ表示预定义的超参数,用于平衡两个重建损失函数的权重;L w和L a表示为:
Figure PCTCN2021135894-appb-000036
Figure PCTCN2021135894-appb-000037
其中,
Figure PCTCN2021135894-appb-000038
表示重构的加权属性矩阵;
Figure PCTCN2021135894-appb-000039
表示通过内积运算生成的重构原始邻接矩阵;N表示样本数量;d表示属性维度;L w表示加权属性矩阵的重构损失;L a表示邻接矩阵的重构损失。
通过最小化公式(4)和公式(5),所提的改进图自编码器同时最小化加权属性矩阵和邻接矩阵的重构损失。
在跨模态信息融合模块13中,将自编码器的模态信息与图自编码器的模态信息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布和目标分布。
为了充分挖掘自编码器和改进的图自编码器所提取的图结构信息和节点属性信息,本实施例提出一种结构与属性信息融合模块。如图2所示,该模块由两部分组成,即跨模态动态融合机制和三元组自监督策略。
跨模态动态融合机制,从局部与全局层面出发,完成两种模态的隐嵌入信息的深度交互并生成更紧凑的共识隐嵌入;
三元自监督策略,在共识隐嵌入和预计算初始化聚类中心的基础上,生成更加精准的软分配分布Q以及更鲁棒的目标分布P。
本实施例提出的跨模态动态融合机制主要包括四个步骤,具体包括:
组合模块,用于将自编码器和图自编码器的隐嵌入进行线性组合,得到初始化融合嵌入信息;
自编码器(Z AE∈R N×d‘)和改进的图自编码器(Z IGAE∈R N×d‘)的隐嵌入进行线性组合,表示为:
Z I=αZ AE+(1-α)Z IGAE          (6)
其中,d'表示隐嵌入维数;α表示可学的系数矩阵,该系数矩阵根据不同数据集的属性选择性地评估两种模态信息的重要性;Z AE表示自编码器;Z IGAE 表示图自编码器;Z I∈R N×d′表示初始化融合嵌入信息;d′表示隐嵌入的维度。
在本实施例中,α初始化为0.5,通过随机梯度下降法进行自动调整。
处理模块,用于基于图卷积的操作增强初始化融合嵌入信息;
设计一种类似图卷积的操作(即消息传递操作)用于处理组合信息。通过上述操作,建模数据的局部结构化信息用于增强初始化融合嵌入信息Z I∈R N×d′,表示为:
Figure PCTCN2021135894-appb-000040
其中,Z L∈R N×d‘表示局部结构增强后的隐嵌入。
重组模块,用于基于自相关学习机制,对初始化融合嵌入信息进行重组;
引入自相关学习机制,用于建模初始信息融合空间的非局部关系。具体而言,首先通过下述公式计算归一化的自相关矩阵,表示为:
Figure PCTCN2021135894-appb-000041
利用S作为系数矩阵,通过计算样本间的全局相关关系对Z_L进行重组,表示为:
Z G=SZ L          (9)
其中,Z G表示对Z L进行重组后的信息;S表示自相关矩阵。
传导模块,用于基于跳跃连接方式使信息在融合机制中传导;
设计一种跳跃连接方式以促进信息在融合机制中传导,表示为:
Figure PCTCN2021135894-appb-000042
其中,β表示尺度参数,将其初始化为0并在训练网络时使其权值梯度可导;
Figure PCTCN2021135894-appb-000043
表示融合聚类嵌入。
跨模态动态融合机制从局部和全局两方面考虑样本的相关性。因此,该模块有助于融合和校对自编码器和图自编码器的信息,从而学习质量更高的共识隐嵌入。
本实施例提出的三元组自监督策略具体为:
为了给聚类方法的训练过程提供可靠的指导,本申请通过融合自编码器和图自编码器所生成的聚类嵌入
Figure PCTCN2021135894-appb-000044
来生成目标分布。三元组自监督策略 中生成软分配分布以及目标分布,表示为:
Figure PCTCN2021135894-appb-000045
Figure PCTCN2021135894-appb-000046
其中,
Figure PCTCN2021135894-appb-000047
表示融合聚类嵌入中第i个样本;u j表示第j个预先计算的聚类中心;v表示学生T-分布的自由度;且q ij表示将第i个样本分配给第j个中心的概率,即软分配分布;p ij表示第i个样本属于第j个聚类中心的概率,即目标分布;j′表示第j′个聚类中心。
在本实施例中,使用学生T-分布作为基核,计算融合嵌入空间中第i个样本
Figure PCTCN2021135894-appb-000048
和第j个预先计算的聚类中心(u j)之间的相似度。
软分配矩阵Q∈R N×K反映了所有样本的概率分布。为了增加聚类分配的置信度,引入公式(11)来指导所有样本接近聚类中心。具体而言,0≤p ij≤1是生成的目标分布P∈R N×K的一个元素,表示第i个样本属于第j个聚类中心的概率。
通过迭代生成的目标分布,根据公式(10)来计算自编码器和改进的图自编码器隐嵌入的软分配分布。自编码器和改进的图自编码器的软分配分布表示为Q'和Q”。
为了在统一框架中训练网络并提高每个部分的表示能力,本实施例设计三元组聚类损失,表示为:
Figure PCTCN2021135894-appb-000049
其中,L KL表示三元组聚类损失。
自编码器和改进的图自编码器的软分配分布以及融合嵌入与鲁棒的目标分布同时对齐。由于目标分布在无监督模式下生成,因此本申请将损失函数称为三元组聚类损失,其相对应的训练机制称为三元组自监督策略。
在联合优化目标模块中,同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。
模型的学习目标主要包括两个部分:
1)自编码器和改进的图自编码器的重构损失;
2)与目标分布相关的聚类损失。
损失表示为:
Figure PCTCN2021135894-appb-000050
其中,L AE表示自编码器的均方误差(MSE)重建损失;λ表示是预定义的超参数。
与结构化深度聚类网络不同,本申请提出的深度融合聚类网络用共识的隐嵌入重构两个子网络的输入。λ是一个预定义的超参数,用于平衡重构和聚类的重要性。
与现有技术相比,本实施例具有以下有益效果:
1、本实施例提供了一种结构与属性信息融合模块,该模块用于增强属性信息与结构信息的交互。首先,自编码器和图自编码器利用共识隐嵌入来重构原始输入,这有助于隐嵌入泛化能力的提升;其次,通过融合自编码器与图自编码器的互补性信息,生成的目标分布的可靠性得到增强;最后,三元自监督学习机制将自编码器、图自编码器与融合部分统一为同一个优化框架,从而提升了隐嵌入的质量与聚类的性能。
2、本实施例提供了一种改进的图自编码器,克服现有编码方法仅重构结构化信息的局限性,通过联合重构结构信息与加权的属性信息来提升聚类框架的泛化能力。
本实施例针对当前深度聚类方法没有充分考虑多种模态信息的融合以及生成目标分布的鲁棒性较差,导致表征学习次优与聚类性能不足的问题,本实施例目的在于提供一种基于跨模态信息融合的深度聚类系统,对两种模态的样本节点进行局部与全局层面的融合。随后,通过评估样本之间的相似度以及由学生T-分布计算出的预聚类中心,在融合嵌入空间中计算软分配分布Q和目标分布P。随后,在目标分布的指导下,重构邻接矩阵、属性矩阵以及局部信息加权后的属性矩阵,同时优化融合部分来训练一个端到端的深度神经框架。最后,利用K均值聚类算法在加权后的融合嵌入空间中执行聚类,实现无监督深度图信息聚类的目的。
实施例二
本实施例提供的一种基于跨模态融合的深度聚类系统与实施例一的不同之处在于:
本实施例在多个数据集上与现有方法进行对比以验证本申请的有效性。
数据集:
本实施例所使用的数据集共有六种,包括三种图数据集与三种非图数据集,数据集的统计信息如表1所示。
数据集 类型 样本量 类别 维度
USPS Image 9298 10 256
HHAR Record 10299 6 561
REUT Text 10000 4 2000
ACM Graph 3025 3 1870
DBLP Graph 4058 4 334
CITE Graph 3327 6 3703
表1
其中,USPS:[LeCun,Y.;Matan,O.;Boser,B.E.;Denker,J.S.;Henderson,D.;Howard,R.E.;Hubbard,W.E.;Jacket,L.D.;and Baird,H.S.1990.Handwritten Zip Code Recognition with Multilayer Networks.In ICPR,36–40];
HHAR:[Lewis,D.D.;Yang,Y.;Rose,T.G.;and Li,F.2004.RCV1:A New Benchmark Collection for Text Categorization Research.Journal of Machine Learning Research 5(2):361–397];
REUT:[Stisen,A.;Blunck,H.;Bhattacharya,S.;Prentow,T.S.;
Figure PCTCN2021135894-appb-000051
M.B.;Dey,A.;Sonne,T.;and Jensen,M.M.2015.Smart Devices Are Different:Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition.In SENSYS,127–140];
ACM:[http://dl.acm.org];
DBLP:[https://dblp.uni-trier.de];
REUT:[http://citeseerx.ist.psu.edu/index]
USPS:该数据集属于图像类型,包含9298张16×16大小的单通道图像,均匀分布在10个类别中。
HHAR:该数据集属于文本类型,包含10299条文本数据,每条数据具有561维特征,均匀分布在6个类别中。
REUT:该数据集属于文本类型,包含10000条文本数据,每条数据具有10000维特征,均匀分布在3个类别中。
ACM:该数据集属于图类型,包含3025个图节点,每个节点具有10000维特征,均匀分布在4个类别中。
DBLP:该数据集属于图类型,包含4058个图节点,每个节点具有334维特征,均匀分布在4个类别中。
REUT:该数据集属于图类型,包含3327个图节点,每个节点具有3703维特征,均匀分布在6个类别中。
训练流程:
本实施例的实现环境是Pytorch平台,训练方法总共包括以下四步。
1)首先,通过最小化重构损失函数,分别对自编码器和图自编码器单独训练30个循环;
2)然后,将两个子网络集成至一个统一的框架下,该过程训练100个循环;
3)随后,根据预计算的初始化聚类中心与三元自监督策略,对整个深度聚类框架训练200个循环直至模型收敛;
4)最后,在共识聚类嵌入空间中利用K均值算法对样本进行划分,从而获得每个样本的聚类ID。遵循现有工作的训练策略,为了避免聚类结果由网络参数初始化造成的随机性,本申请对每组实验重复10次并汇报10次结果的均值和标准差。
参数设置:
本实施例统一使用Adam优化对模型进行优化。在USPS和HHAR数据集上的模型学习率设置为0.001,在REUT、DBLP和CITE数据集上的模型学习率设置为0.0001,在ACM数据集上的模型学习率设置为0.00005。训练批次参数设置为256,并采用早停策略避免模型过拟合。根据超参数敏感性分析实验,两个平衡因子γ和λ分别设置为0.1和10。对于非图数据集,在构造邻接矩阵的过程中,每个样本的邻居个数值设置为5。
评价指标:
本实施例采用深度聚类算法领域公认的四种评价指标:聚类精度(ACC),正则化互信息(NMI),平均Rand指数(ARI)和F1分数。每个样本的聚类ID和类别ID之间的匹配关系采用匈牙利算法(Kuhn-Munkres[Lovász,L.;and Plummer,M.1986.Matching Theory])。
对比方法:
本实施例在6种多类型数据集上与10种基准算法进行对比实验。对比方法包括K均值算法、自编码器、深度嵌入聚类法、改进的深度嵌入聚类法、图自编码器、图变分自编码器、对抗正则图自编码器、深度注意力图嵌入聚类法以及结构化深度聚类法。
如表2所示,通过将本实施例方法与现有方法的比较可得出以下结论。1)在多种数据集上,本实施例的聚类性能优于对比方法。具体而言,K均值聚类法直接在原始数据上执行聚类;自编码器、深度嵌入聚类法与改进的深度嵌入聚类法仅挖掘数据的属性信息来学习用于聚类的隐嵌入。这些方法没有将数据的结构信息考虑在内,因此获得的聚类结果是次优的。相比之下,本实施例充分整合了原始数据的属性信息和结构信息,并对两种模态信息进行互补学习共识的嵌入表征,从而提升隐嵌入的质量与聚类效果;2)现有基于图卷积的聚类方法,例如图自编码器、图变分自编码器、对抗正则图自编码器与深度注意力图嵌入聚类法,没有充分挖掘数据自身的属性信息,且存在由于信息不断聚合造成的过平滑现象。本实施例将自编码其中基于属性的表征整合至统一的聚类框架中,通过融合模块对图结构与节点属性进行交互式学习共识嵌入,从而提升了聚类性能;3)相比最先进的两种聚类方法、结构化深度聚类法及其变种,本申请在六个数据集上实现了性能的全面提升。以DBLP数据集为例,本申请的性能明显优于SDCN和SDCN-Q,在精度、均值互信息、平均Rand指数(ARI)和F1分数四种指标上分别提升了7.9%、4.2%、7.8%和8.0%。
Figure PCTCN2021135894-appb-000052
Figure PCTCN2021135894-appb-000053
表2
根据表2可得出在六个公共数据集上的实验结果证明本申请的性能优于现有方法的性能。
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以 上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。

Claims (10)

  1. 一种基于跨模态融合的深度聚类系统,其特征在于,包括自编码器、图自编码器、跨模态信息融合模块和联合优化目标模块;所述图自编码器与自编码器连接,所述跨模态信息融合模块分别与自编码器和图自编码器连接;所述联合优化目标模块分别与自编码器、图自编码器、跨模态信息融合模块连接;
    自编码器,用于对图数据的属性信息进行特征提取并重构原始属性矩阵;
    图自编码器,用于对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵;
    跨模态信息融合模块,用于将自编码器的模态信息与图自编码器的模态信息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布和目标分布;
    联合优化目标模块,用于同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。
  2. 根据权利要求1所述的一种基于跨模态融合的深度聚类系统,其特征在于,所述图自编码器中对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵,具体为:
    图自编码器中的编码器和解码器的形式,表示为:
    Figure PCTCN2021135894-appb-100001
    Figure PCTCN2021135894-appb-100002
    其中,Z (l)表示第l个编码层的输出嵌入;
    Figure PCTCN2021135894-appb-100003
    表示第h个解码层的输出嵌入;W (l)
    Figure PCTCN2021135894-appb-100004
    分别表示第l个编码器层和第h个解码器层的可学参数矩阵;σ表示非线性激活函数;
    Figure PCTCN2021135894-appb-100005
    表示归一化后的原始邻接矩阵;
    Figure PCTCN2021135894-appb-100006
    表示第(h-1)个解码层的输出嵌入;Z (l-1)表示第(l-1)个编码层的输出嵌入;
    图自编码器的最小化混合损失函数L IGAE,表示为:
    L IGAE=L w+γL a  (3)
    其中,γ表示预定义的超参数,用于平衡两个重建损失函数的权重;L w和L a表示为:
    Figure PCTCN2021135894-appb-100007
    Figure PCTCN2021135894-appb-100008
    其中,
    Figure PCTCN2021135894-appb-100009
    表示重构的加权属性矩阵;
    Figure PCTCN2021135894-appb-100010
    表示通过内积运算生成的重构原始邻接矩阵;N表示样本数量;d表示属性维度;L w表示加权属性矩阵的重构损失;L a表示邻接矩阵的重构损失。
  3. 根据权利要求1所述的一种基于跨模态融合的深度聚类系统,其特征在于,所述跨模态信息融合模块包括跨模态动态融合机制和三元组自监督策略;
    跨模态动态融合机制,用于将自编码器的模态信息与图自编码器的模态信息进行隐嵌入信息的深度交互,生成共识隐嵌入;
    三元组自监督策略,用于根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布以及目标分布。
  4. 根据权利要求3所述的一种基于跨模态融合的深度聚类系统,其特征在于,所述跨模态动态融合机制具体包括:
    组合模块,用于将自编码器和图自编码器的隐嵌入进行线性组合,得到初始化融合嵌入信息,表示为:
    Z I=αZ AE+(1-α)Z IGAE  (6)
    其中,d'表示隐嵌入维数;α表示可学的系数矩阵;Z AE表示自编码器;Z IGAE表示图自编码器;Z I∈R N×d′表示初始化融合嵌入信息;d′表示隐嵌入的维度;
    处理模块,用于基于图卷积的操作增强初始化融合嵌入信息,表示为:
    Figure PCTCN2021135894-appb-100011
    其中,Z L∈R N×d‘表示局部结构增强后的隐嵌入;
    重组模块,用于基于自相关学习机制,对初始化融合嵌入信息进行重组,表示为:
    Z G=SZ L  (8)
    其中,Z G表示对Z L进行重组后的信息;S表示自相关矩阵;
    传导模块,用于基于跳跃连接方式使信息在融合机制中传导,表示为:
    Figure PCTCN2021135894-appb-100012
    其中,β表示尺度参数;
    Figure PCTCN2021135894-appb-100013
    表示融合聚类嵌入。
  5. 根据权利要求4所述的一种基于跨模态融合的深度聚类系统,其特征在于,所述三元组自监督策略中生成软分配分布以及目标分布,表示为:
    Figure PCTCN2021135894-appb-100014
    Figure PCTCN2021135894-appb-100015
    其中,
    Figure PCTCN2021135894-appb-100016
    表示融合聚类嵌入中第i个样本;u j表示第j个预先计算的聚类中心;v表示学生T-分布的自由度;且q ij表示将第i个样本分配给第j个中心的概率,即软分配分布;p ij表示第i个样本属于第j个聚类中心的概率,即目标分布;j′表示第j′个聚类中心。
  6. 根据权利要求5所述的一种基于跨模态融合的深度聚类系统,其特征在于,所述三元组自监督策略中生成目标分布后还包括:
    通过三元组聚类损失提高每个部分的表示能力,其中三元组聚类损失表示为:
    Figure PCTCN2021135894-appb-100017
    其中,L KL表示三元组聚类损失。
  7. 根据权利要求1所述的一种基于跨模态融合的深度聚类系统,其特征在于,所述联合优化目标模块中同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程具体包括:
    计算自编码器和图自编码器的重构损失,计算自编码器、图自编码器与目标分布相关的聚类损失,表示为:
    L=L AE+L IGAE+λL KL  (13)
    其中,L AE表示自编码器的均方误差重建损失;λ表示是预定义的超参数。
  8. 一种基于跨模态融合的深度聚类方法,其特征在于,包括:
    S1.自编码器对图数据的属性信息进行特征提取并重构原始属性矩阵;
    S2.图自编码器对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵;
    S3.跨模态信息融合模块将自编码器的模态信息与图自编码器的模态信息进行整合,生成共识隐嵌入,并根据共识隐嵌入和预计算初始化聚类中心,生 成软分配分布和目标分布;
    S4.联合优化目标模块同步指导自编码器、图自编码器、跨模态信息融合模块的参数更新过程。
  9. 根据权利要求8所述的一种基于跨模态融合的深度聚类方法,其特征在于,所述步骤S2中对图数据的结构信息进行特征提取并重构原始邻接矩阵与加权属性矩阵,具体为:
    图自编码器中的编码器和解码器的形式,表示为:
    Figure PCTCN2021135894-appb-100018
    Figure PCTCN2021135894-appb-100019
    其中,Z (l)表示第l个编码层的输出嵌入;
    Figure PCTCN2021135894-appb-100020
    表示第h个解码层的输出嵌入;W (l)
    Figure PCTCN2021135894-appb-100021
    分别表示第l个编码器层和第h个解码器层的可学参数矩阵;σ表示非线性激活函数;
    Figure PCTCN2021135894-appb-100022
    表示归一化后的原始邻接矩阵;
    Figure PCTCN2021135894-appb-100023
    表示第(h-1)个解码层的输出嵌入;Z (l-1)表示第(l-1)个编码层的输出嵌入;
    图自编码器的最小化混合损失函数L IGAE,表示为:
    L IGAE=L w+γL a  (3)
    其中,γ表示预定义的超参数,用于平衡两个重建损失函数的权重;L w和L a表示为:
    Figure PCTCN2021135894-appb-100024
    Figure PCTCN2021135894-appb-100025
    其中,
    Figure PCTCN2021135894-appb-100026
    表示重构的加权属性矩阵;
    Figure PCTCN2021135894-appb-100027
    表示通过内积运算生成的重构原始邻接矩阵;N表示样本数量;d表示属性维度;L w表示加权属性矩阵的重构损失;L a表示邻接矩阵的重构损失。
  10. 根据权利要求9所述的一种基于跨模态融合的深度聚类方法,其特征在于,所述步骤S3具体包括:
    S31.将自编码器的模态信息与图自编码器的模态信息进行隐嵌入信息的深度交互,生成共识隐嵌入;
    S32.根据共识隐嵌入和预计算初始化聚类中心,生成软分配分布以及目标分布。
PCT/CN2021/135894 2021-02-04 2021-12-07 一种基于跨模态融合的深度聚类方法及系统 WO2022166361A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
ZA2023/08290A ZA202308290B (en) 2021-02-04 2023-08-28 Cross-modal fusion-based deep clustering method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110154434.7A CN112906770A (zh) 2021-02-04 2021-02-04 一种基于跨模态融合的深度聚类方法及系统
CN202110154434.7 2021-02-04

Publications (1)

Publication Number Publication Date
WO2022166361A1 true WO2022166361A1 (zh) 2022-08-11

Family

ID=76122295

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135894 WO2022166361A1 (zh) 2021-02-04 2021-12-07 一种基于跨模态融合的深度聚类方法及系统

Country Status (3)

Country Link
CN (1) CN112906770A (zh)
WO (1) WO2022166361A1 (zh)
ZA (1) ZA202308290B (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115985402A (zh) * 2023-03-20 2023-04-18 北京航空航天大学 一种基于归一化流理论的跨模态数据迁移方法
CN116206133A (zh) * 2023-04-25 2023-06-02 山东科技大学 一种rgb-d显著性目标检测方法
CN116720523A (zh) * 2023-04-19 2023-09-08 贵州轻工职业技术学院 一种基于多核的深度文本聚类方法、装置及存储介质
CN117113240A (zh) * 2023-10-23 2023-11-24 华南理工大学 动态网络社区发现方法、装置、设备及存储介质
CN117407697A (zh) * 2023-12-14 2024-01-16 南昌科晨电力试验研究有限公司 基于自动编码器和注意力机制的图异常检测方法及系统
CN117688257A (zh) * 2024-01-29 2024-03-12 东北大学 一种面向异构用户行为模式的长期轨迹预测方法
CN117727307A (zh) * 2024-02-18 2024-03-19 百鸟数据科技(北京)有限责任公司 基于特征融合的鸟类声音智能识别方法
CN117893575B (zh) * 2024-03-15 2024-05-31 青岛哈尔滨工程大学创新发展中心 图神经网络融入自注意力机制的船舶运动预报方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906770A (zh) * 2021-02-04 2021-06-04 浙江师范大学 一种基于跨模态融合的深度聚类方法及系统
CN113792784B (zh) * 2021-09-14 2022-06-21 上海任意门科技有限公司 用于用户聚类的方法、电子设备和存储介质
CN113762648B (zh) * 2021-10-26 2023-12-19 平安科技(深圳)有限公司 公卫黑天鹅事件预测方法、装置、设备及介质
CN117909910A (zh) * 2024-03-19 2024-04-19 成都工业学院 基于图注意力网络的系统异常日志自动检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958216A (zh) * 2017-11-27 2018-04-24 沈阳航空航天大学 基于半监督的多模态深度学习分类方法
CN109376857A (zh) * 2018-09-03 2019-02-22 上海交通大学 一种融合结构和属性信息的多模态深度网络嵌入方法
WO2019137912A1 (en) * 2018-01-12 2019-07-18 Connaught Electronics Ltd. Computer vision pre-fusion and spatio-temporal tracking
CN112906770A (zh) * 2021-02-04 2021-06-04 浙江师范大学 一种基于跨模态融合的深度聚类方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958216A (zh) * 2017-11-27 2018-04-24 沈阳航空航天大学 基于半监督的多模态深度学习分类方法
WO2019137912A1 (en) * 2018-01-12 2019-07-18 Connaught Electronics Ltd. Computer vision pre-fusion and spatio-temporal tracking
CN109376857A (zh) * 2018-09-03 2019-02-22 上海交通大学 一种融合结构和属性信息的多模态深度网络嵌入方法
CN112906770A (zh) * 2021-02-04 2021-06-04 浙江师范大学 一种基于跨模态融合的深度聚类方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WENXUAN TU; SIHANG ZHOU; XINWANG LIU; XIFENG GUO; ZHIPING CAI; EN ZHU; JIEREN CHENG: "Deep Fusion Clustering Network", ARXIV.ORG, 15 December 2020 (2020-12-15), pages 1 - 10, XP081840451 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115985402B (zh) * 2023-03-20 2023-09-19 北京航空航天大学 一种基于归一化流理论的跨模态数据迁移方法
CN115985402A (zh) * 2023-03-20 2023-04-18 北京航空航天大学 一种基于归一化流理论的跨模态数据迁移方法
CN116720523A (zh) * 2023-04-19 2023-09-08 贵州轻工职业技术学院 一种基于多核的深度文本聚类方法、装置及存储介质
CN116720523B (zh) * 2023-04-19 2024-02-06 贵州轻工职业技术学院 一种基于多核的深度文本聚类方法、装置及存储介质
CN116206133A (zh) * 2023-04-25 2023-06-02 山东科技大学 一种rgb-d显著性目标检测方法
CN116206133B (zh) * 2023-04-25 2023-09-05 山东科技大学 一种rgb-d显著性目标检测方法
CN117113240B (zh) * 2023-10-23 2024-03-26 华南理工大学 动态网络社区发现方法、装置、设备及存储介质
CN117113240A (zh) * 2023-10-23 2023-11-24 华南理工大学 动态网络社区发现方法、装置、设备及存储介质
CN117407697A (zh) * 2023-12-14 2024-01-16 南昌科晨电力试验研究有限公司 基于自动编码器和注意力机制的图异常检测方法及系统
CN117407697B (zh) * 2023-12-14 2024-04-02 南昌科晨电力试验研究有限公司 基于自动编码器和注意力机制的图异常检测方法及系统
CN117688257A (zh) * 2024-01-29 2024-03-12 东北大学 一种面向异构用户行为模式的长期轨迹预测方法
CN117727307A (zh) * 2024-02-18 2024-03-19 百鸟数据科技(北京)有限责任公司 基于特征融合的鸟类声音智能识别方法
CN117727307B (zh) * 2024-02-18 2024-04-16 百鸟数据科技(北京)有限责任公司 基于特征融合的鸟类声音智能识别方法
CN117893575B (zh) * 2024-03-15 2024-05-31 青岛哈尔滨工程大学创新发展中心 图神经网络融入自注意力机制的船舶运动预报方法及系统

Also Published As

Publication number Publication date
CN112906770A (zh) 2021-06-04
ZA202308290B (en) 2023-09-27

Similar Documents

Publication Publication Date Title
WO2022166361A1 (zh) 一种基于跨模态融合的深度聚类方法及系统
Li et al. Deep convolutional computation model for feature learning on big data in internet of things
Dong et al. Automatic age estimation based on deep learning algorithm
Shao et al. Multiple incomplete views clustering via weighted nonnegative matrix factorization with regularization
CN107016438B (zh) 一种基于中医辨证人工神经网络算法模型的系统
CN111753024B (zh) 一种面向公共安全领域的多源异构数据实体对齐方法
Guo et al. Multiple kernel learning based multi-view spectral clustering
CN110110318B (zh) 基于循环神经网络的文本隐写检测方法及系统
CN112765370B (zh) 知识图谱的实体对齐方法、装置、计算机设备和存储介质
Wang et al. Multi-modal knowledge graphs representation learning via multi-headed self-attention
CN113157957A (zh) 一种基于图卷积神经网络的属性图文献聚类方法
CN112529063B (zh) 一种适用于帕金森语音数据集的深度域适应分类方法
CN111985623A (zh) 基于最大化互信息和图神经网络的属性图群组发现方法
Wang et al. Multi-view subspace clustering via structured multi-pathway network
CN112651940A (zh) 基于双编码器生成式对抗网络的协同视觉显著性检测方法
Mettes et al. Hyperbolic deep learning in computer vision: A survey
CN113378938B (zh) 一种基于边Transformer图神经网络的小样本图像分类方法及系统
CN112668633B (zh) 一种基于细粒度领域自适应的图迁移学习方法
Chander et al. A parallel fractional lion algorithm for data clustering based on MapReduce cluster framework
Lu et al. Soft-orthogonal constrained dual-stream encoder with self-supervised clustering network for brain functional connectivity data
CN117036760A (zh) 一种基于图对比学习的多视图聚类模型实现方法
CN117093849A (zh) 一种基于自动生成模型的数字矩阵特征分析方法
CN116797817A (zh) 基于自监督图卷积模型的自闭症疾病预测技术
Bi et al. A Fast Nonnegative Autoencoder-based Approach to Latent Feature Analysis on High-Dimensional and Incomplete Data
CN115910232A (zh) 多视图的药物对反应预测方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21924354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21924354

Country of ref document: EP

Kind code of ref document: A1