WO2022156333A1 - 基于自编码器的多模态自适应融合深度聚类模型及方法 - Google Patents

基于自编码器的多模态自适应融合深度聚类模型及方法 Download PDF

Info

Publication number
WO2022156333A1
WO2022156333A1 PCT/CN2021/131248 CN2021131248W WO2022156333A1 WO 2022156333 A1 WO2022156333 A1 WO 2022156333A1 CN 2021131248 W CN2021131248 W CN 2021131248W WO 2022156333 A1 WO2022156333 A1 WO 2022156333A1
Authority
WO
WIPO (PCT)
Prior art keywords
encoder
fusion
clustering
feature
autoencoder
Prior art date
Application number
PCT/CN2021/131248
Other languages
English (en)
French (fr)
Inventor
朱信忠
徐慧英
董仕豪
郭西风
王霞
靳林通
赵建民
Original Assignee
浙江师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江师范大学 filed Critical 浙江师范大学
Priority to US18/273,783 priority Critical patent/US20240095501A1/en
Priority to ZA2022/07739A priority patent/ZA202207739B/en
Publication of WO2022156333A1 publication Critical patent/WO2022156333A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Definitions

  • the present application relates to the technical field of clustering analysis, and in particular, to an autoencoder-based multimodal adaptive fusion deep clustering model and method.
  • Cluster analysis is a fundamental problem in many fields, such as machine learning, data mining, pattern recognition, image analysis, and bioinformatics. Clustering is to divide similar objects into different groups or more subsets by static classification, so that the member objects in the same subset have similar properties. Supervised learning. There are also some common clustering methods in the prior art, but their performance on high-dimensional data is usually poor due to the inefficiency of the similarity measure used by traditional clustering methods. Furthermore, these methods usually have high computational complexity on large-scale datasets. Therefore, dimensionality reduction and feature transformation methods have been extensively studied to map raw data into a new feature space where the generated data is more easily separable by existing classifiers. In general, existing data transformation methods include linear transformations (such as principal component analysis) and nonlinear transformations (such as kernel methods and spectral methods). Nonetheless, the highly complex underlying structure of the data still challenges the effectiveness of existing clustering methods.
  • Deep neural networks can be used to transform data into representations that are easier to cluster due to the inherent nature of their highly nonlinear transformations.
  • clustering methods have also involved deep embedding clustering as well as other novel methods, making deep clustering a popular research area. Examples are stacked autoencoders, variable autoencoders, and convolutional autoencoders, which were proposed for unsupervised learning.
  • Neural network-based clustering methods beat traditional methods to a certain extent by being an efficient way to learn complex nonlinear transformations to obtain powerful features.
  • the single-modal method of acquiring features through neural networks that is, first extracting modal features, and then using traditional clustering, such as K-means or spectral clustering, does not fully extract all features of the data, and does not make good use of multiple
  • traditional clustering such as K-means or spectral clustering
  • this separate learning strategy may bring unsatisfactory clustering results and even result in large variation due to the drawbacks of unsupervised learning.
  • the present application proposes a multi-modal adaptive feature fusion deep clustering model and a clustering method based on an autoencoder.
  • the purpose of this application is to provide an autoencoder-based multimodal adaptive fusion deep clustering model and method for the defects of the prior art.
  • a multimodal adaptive fusion deep clustering model based on an autoencoder including an encoder, a multimodal adaptive fusion layer, a decoder, and a deep embedded clustering layer;
  • the encoder includes an autoencoder, a convolutional encoder, convolutional variational autoencoder;
  • the encoder is used to make the dataset X go through the nonlinear mapping h(X; ⁇ m ) of the autoencoder, the convolutional autoencoder, and the convolutional variational autoencoder, respectively, to obtain the autoencoder and the convolutional autoencoder, respectively.
  • the multi-modal adaptive fusion layer is connected with the encoder, and is used to fuse the potential features Zm obtained by the self-encoder, the convolutional self-encoder and the convolutional variational self-encoder respectively through the adaptive spatial feature fusion method into the common subspace to obtain the fusion feature Z;
  • a decoder connected to the multi-modal adaptive fusion layer, is used to decode the fused feature Z using a structure symmetric with the encoder to obtain a decoded reconstructed data set
  • the deeply embedded clustering layer is connected to the multi-modal adaptive fusion layer, and is used to cluster the fusion feature Z, and obtain the final accuracy rate ACC by comparing the clustering result with the real label.
  • ⁇ m represents the encoder model parameters
  • m represents the encoder sequence
  • the fusion feature Z is obtained in the multi-modal adaptive fusion layer, which is expressed as:
  • ⁇ m represents the importance weight of the feature of the mth modality
  • the adaptive feature fusion parameter is obtained by the adaptive learning of the network
  • ⁇ m is defined by the softmax function using ⁇ m as the control parameter respectively; the weight scalar ⁇ m is calculated by 1 ⁇ 1 convolution on different modal features respectively, and is learned by standard back-propagation.
  • clustering of the fusion feature Z in the deep embedded clustering layer is specifically:
  • the encoder also includes using reconstruction loss to update the network parameters of the self-encoder, the convolutional self-encoder, and the convolutional variational self-encoder; specifically, inputting the original data x i to the encoder and the decoder. output reconstructed data
  • the squared-difference function of is used as the reconstruction loss, pretraining the encoder, and obtaining the initialized model, which is expressed as:
  • LR represents the reconstruction loss function
  • the deep embedded clustering layer also includes using the clustering loss KL divergence to update the clustering results, encoder parameters and fusion parameters; specifically:
  • Z i ⁇ (h(x i )) ⁇ Z; ⁇ represents the degree of freedom of student t distribution; q ij represents the probability of assigning sample i to cluster center ⁇ j ; ⁇ j′ represents each center point;
  • the clusters are iteratively optimized by learning from the high-confidence assignments of the clusters with the help of an auxiliary target distribution, i.e., the model is trained by matching the soft assignments to the target distribution;
  • the target loss function is defined as the soft assignment probability q i and the KL divergence between the auxiliary distribution pi, expressed as:
  • L C represents the clustering loss function
  • the deep embedded clustering layer further includes:
  • the L gradient of each data point Z i and the feature space embedding of each cluster center ⁇ j is calculated as follows:
  • the gradient Compute network parameter gradients by backpropagation Clustering is stopped when the number of points allocated to changing the cluster between two consecutive iterations is less than a preset proportion of the total number of points.
  • an autoencoder-based multimodal adaptive fusion deep clustering method including:
  • the fusion feature Z is obtained in the step S2, which is expressed as:
  • ⁇ m represents the importance weight of the feature of the mth modality
  • the adaptive feature fusion parameter is obtained by the adaptive learning of the network
  • ⁇ m is defined by the softmax function using ⁇ m as the control parameter respectively; the weight scalar ⁇ m is calculated by 1 ⁇ 1 convolution on different modal features respectively, and is learned by standard back-propagation.
  • this application proposes a novel multimodal adaptive feature fusion deep clustering framework, which includes a multimodal encoder, an adaptive fusion network and a deep clustering layer.
  • the model extracts original data features through nonlinear mapping, realizes dimensionality reduction of high-dimensional data, optimizes the common subspace of data features, and finally uses KL divergence to constrain subspace clustering .
  • Experimental results on three public datasets demonstrate that our model outperforms several state-of-the-art models.
  • Embodiment 1 is a structural diagram of a self-encoder-based multimodal self-adaptive fusion deep clustering model provided by Embodiment 1;
  • Embodiment 2 is a schematic structural diagram of an autoencoder-based multimodal deep clustering (MDEC) provided by Embodiment 1;
  • MDEC multimodal deep clustering
  • FIG. 3 is a schematic diagram of the specific information and sample information of the dataset provided by the second embodiment
  • FIG. 4 is a schematic diagram of an autoencoder-based multimodal adaptive fusion deep clustering method provided in Embodiment 3.
  • FIG. 4 is a schematic diagram of an autoencoder-based multimodal adaptive fusion deep clustering method provided in Embodiment 3.
  • the purpose of this application is to provide an autoencoder-based multimodal adaptive fusion deep clustering model and method for the defects of the prior art.
  • This embodiment provides a multi-modal adaptive fusion deep clustering model based on an auto-encoder, as shown in FIG. 1, including an encoder 11, a multi-modal adaptive fusion layer 12, a decoder 13, and a deep embedded clustering layer 14; the encoder 11 includes an autoencoder, a convolutional autoencoder, and a convolutional variational autoencoder;
  • the encoder 11 is used to make the data set X pass through the nonlinear mapping h(X; ⁇ m ) of the self-encoder, the convolutional self-encoder and the convolutional variational
  • the multi-modal adaptive fusion layer 12 is connected to the encoder 11, and is used to fuse the potential features Z m obtained by the autoencoder, the convolutional autoencoder, and the convolutional variational autoencoder respectively through adaptive spatial feature fusion. into the common subspace to obtain the fusion feature Z;
  • the decoder 13 is connected with the multi-modal adaptive feature fusion 12, and is used to decode the clustered fusion feature Z using a structure symmetric with the encoder to obtain a decoded reconstructed data set
  • the deep embedded clustering layer 14 is connected to the multi-modal adaptive fusion layer 12, and is used for clustering the fusion feature Z to obtain the clustered fusion feature Z;
  • Figure 2 shows a schematic diagram of the structure of multimodal adaptive feature fusion deep clustering (MDEC) based on autoencoder, which consists of four parts: autoencoder, convolutional autoencoder, and convolutional variational autoencoder
  • the encoder 11 composed of the encoder; the multi-modal adaptive fusion layer 12; the depth embedded clustering layer 13; the decoder 14.
  • the data set X is passed through the nonlinear mapping h(X; ⁇ m ) of the self-encoder, the convolutional self-encoder and the convolutional variational Encoder, Convolutional Variational Autoencoder Latent Feature Z m .
  • X is used to represent the data set
  • the latent feature Z m is obtained through the nonlinear mapping h(X; ⁇ m ) of the autoencoder, the convolutional autoencoder and the variational autoencoder respectively.
  • the encoder can convert high-dimensional data into low-dimensional features, the expression is as follows:
  • ⁇ m represents the encoder model parameters
  • m represents the encoder sequence
  • the latent features Z m obtained by the autoencoder, the convolutional autoencoder and the convolutional variational autoencoder respectively are fused into the common subspace through adaptive spatial feature fusion, Get the fusion feature Z.
  • ⁇ m represents the importance weight of the feature of the mth modality
  • the adaptive feature fusion parameter is obtained by the adaptive learning of the network
  • ⁇ m is defined by the softmax function using ⁇ m as the control parameter respectively; the weight scalar ⁇ m is calculated by 1 ⁇ 1 convolution on different modal features respectively, and is learned by standard back-propagation.
  • the clustered fused feature Z is decoded using a structure symmetric to the encoder to obtain a decoded data set.
  • the fusion feature Z is clustered, and the final accuracy rate ACC is obtained by comparing the clustering result with the real label.
  • Cluster the fusion feature Z first initialize the cluster center Then, the soft assignment of feature points and cluster centers is calculated, and the KL divergence of the soft assignment and auxiliary distribution is calculated to update the cluster centers ⁇ j , parameters ⁇ and ⁇ .
  • a loss function is also included.
  • the loss function consists of two parts: (1) The reconstruction loss LR is used to update the encoder, convolutional autoencoder, and convolutional variational autoencoder network parameters. (2) The clustering loss L C is used to update the clustering results and autoencoder parameters and adaptive fusion parameters.
  • the model uses the squared difference function of the encoder input and decoder output as the reconstruction loss, pretrains the autoencoder, and obtains a good initialized model:
  • LR represents the reconstruction loss function
  • L C represents the clustering loss function
  • q ij represents the probability that sample i belongs to class j
  • p ij represents the target probability that sample i belongs to class j
  • pi is calculated by first raising qi to the second power and then normalizing by the frequency of each cluster, expressed as:
  • the training is divided into two stages, namely the pre-training initialization stage and the clustering optimization stage.
  • the model is trained using the following loss function:
  • the loss function is used in the clustering optimization stage, expressed as:
  • the clustering also includes optimizing the function, specifically:
  • the L gradient of the feature space embedding for each data point Z i and each class centroid ⁇ j is calculated as follows:
  • different latent features are extracted by different encoders, and the features are fused into a common subspace.
  • the model proposed in this example is validated on multiple datasets and compared with multiple excellent methods.
  • MNIST The MNIST dataset consists of 70,000 handwritten digits with a size of 28x28 pixels. The numbers have been centered and dimension-normalized as described in "LeCun, Yann, Bottou, Le'on, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11) : 2278–2324, 1998”.
  • FASHION-MNIST Contains 70,000 fashion product images from 20 categories, and the image size is the same as MNIST, such as the literature "Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algo-rithms.arXiv preprint arXiv:1708.07747”.
  • COIL-20 Collect 20 categories of 1440 128 ⁇ 128 grayscale object images viewed from different angles, such as the literature "Li, F.; Qiao, H.; and Zhang, B. 2018. Discriminatively boosted image clustering with fully convolutional auto-encoders.PR 83:161–173”.
  • Autoencoders, convolutional autoencoders and convolutional variational autoencoders are adopted as three single-modal deep network branches for the original image, and the specific network configurations are shown in Table 2.
  • DGCCA Deep Canonical Correlation Autoencoder
  • This implementation proposes a novel multimodal adaptive feature fusion deep clustering framework, which includes a multimodal encoder, an adaptive feature fusion network and a deep clustering layer.
  • the model extracts original data features through nonlinear mapping, reduces the dimension of high-dimensional data, optimizes the common subspace of data features, and finally uses KL divergence to constrain subspace clustering.
  • Experimental results on three public datasets demonstrate that the model of this example outperforms several state-of-the-art models.
  • This embodiment provides an autoencoder-based multimodal adaptive fusion deep clustering method, as shown in FIG. 4 , including:
  • this embodiment proposes a novel multimodal adaptive fusion deep clustering framework, which includes a multimodal encoder, a multimodal adaptive feature fusion network and a deep clustering layer.
  • the model extracts the original data features through nonlinear mapping, reduces the dimension of high-dimensional data, optimizes the common subspace of data features, and finally uses KL divergence to constrain subspace clustering.
  • Experimental results on three public datasets demonstrate that the model of this example outperforms several state-of-the-art models.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了基于自编码器的多模态自适应融合深度聚类模型,包括编码器结构、多模态自适应融合层、解码器结构、深度嵌入式聚类层;编码器,用于使数据集(I)分别通过自编码器、卷积自编码器、卷积变分自编码器三种非线性映射h(X;θm),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Zm;多模态自适应特征融合层,用于将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Zm通过自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;解码器,用于使用与编码器对称的结构对融合特征Z进行解码,得到解码后的重构数据集(I);深度嵌入式聚类层,用于对融合特征Z进行聚类,通过对比聚类结果与真实标签得到最终准确率ACC。

Description

基于自编码器的多模态自适应融合深度聚类模型及方法 技术领域
本申请涉及聚类分析技术领域,尤其涉及基于自编码器的多模态自适应融合深度聚类模型及方法。
背景技术
聚类分析是许多领域的基本问题,例如机器学习,数据挖掘,模式识别,图像分析以及生物信息。聚类是把相似的对象通过静态分类的方法分成不同的组别或者更多的子集,这样让在同一个子集中的成员对象都有相似的一些属性,一般把数据聚类归纳为一种非监督式学习。现有技术中也有一些常见的聚类方法,但是由于传统的聚类方法所使用的相似性度量方法效率低下,因此它们在高维数据上的性能通常较差。此外,这些方法通常在大规模数据集上具有较高的计算复杂性。因此,人们广泛研究了降维和特征转换方法,以将原始数据映射到一个新的特征空间中,在该特征空间中,生成的数据更容易被现有的分类器分离。一般而言,现有的数据转换方法包括线性变换(例如主成分分析)和非线性变换(例如核方法和光谱方法)。尽管如此,数据的高度复杂的潜在结构仍在挑战现有聚类方法的有效性。
由于深度学习的发展,由于深度神经网络的高度非线性转换的内在特性,可将其用于将数据转换为更易于聚类的表示形式。近年来,聚类方法还涉及到深度嵌入聚类以及其他新颖的方法,使深度聚类成为流行的研究领域。例如堆叠式自动编码器,可变自动编码器和卷积自动编码器,这是为无监督学习而提出的。基于神经网络的聚类方法在一定程度上击败了传统方法,方法是学习复杂的非线性变换以获得强大的特征的有效的方法。但是通过神经网络获取特征的单一模态方法,即,首先提取模态特征,然后采用传统的聚类,例如K均值或频谱聚类,并没有充分提取数据的全部特征,没有很好地利用多模态特征学习和聚类之间的关系,因此这种单独的学习策略可能会带来不令人满意的聚类结果甚至由于无监督学习的弊端导致结果变化很大。为了解决这个问题,本申请提出了基于自编码器的多模态自适应特征融合深度聚类模型及聚类方法。
发明内容
本申请的目的是针对现有技术的缺陷,提供了基于自编码器的多模态自适应融合深度聚类模型及方法。使用多种不同的深度自编码器来学习原始数据的潜在表示,并约束它们学习不同的特征,对几个自然图像数据集的实验评估表明,与现有方法相比,该方法有显著改进。
为了实现以上目的,本申请采用以下技术方案:
基于自编码器的多模态自适应融合深度聚类模型,包括编码器、多模态自适应融合层、解码器、深度嵌入式聚类层;所述编码器包括自编码器、卷积自编码器、卷积变分自编码器;
编码器,用于使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器非线性的映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
多模态自适应融合层,与所述编码器连接,用于将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m通过自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;
解码器,与所述多模态自适应融合层连接,用于使用与编码器对称的结构对融合后的特征Z进行解码,得到解码后的重构数据集
Figure PCTCN2021131248-appb-000001
深度嵌入式聚类层,与所述多模态自适应融合层连接,用于对融合特征Z进行聚类,通过对比聚类结果与真实标签得到最终准确率ACC。
进一步的,所述编码器中分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m,表示为:
Z m=h(X;θ m)
其中,θ m表示编码器模型参数;m表示编码器序列。
进一步的,所述多模态自适应融合层中得到融合特征Z,表示为:
Z=ω 1·Z 12·Z 23·Z 3
其中,ω m表示第m个模态的特征的重要性权重,由网络自适应的学习,得到自适应特征融合参数;
限制
Figure PCTCN2021131248-appb-000002
并定义:
Figure PCTCN2021131248-appb-000003
其中,ω m分别通过使用β m作为控制参数的softmax函数定义;在不同模态特征上分别使用1×1卷积计算权重标量β m,通过标准反向传播来学习。
进一步的,所述解码器中得到解码后的重构数据集
Figure PCTCN2021131248-appb-000004
表示为:
Figure PCTCN2021131248-appb-000005
其中,
Figure PCTCN2021131248-appb-000006
表示解码器模型参数。
进一步的,所述深度嵌入式聚类层中对融合特征Z进行聚类具体为:
将n个点
Figure PCTCN2021131248-appb-000007
分成k个类,每个类的中心用μ j,j=1,...,k,初始化聚类中心
Figure PCTCN2021131248-appb-000008
并计算特征点与聚类中心的软分配q ij和辅助分布p i,最终用软分配q ij和辅助分布p i的KL散度来定义聚类损失函数,并更新聚类中心μ j、编码器、解码器参数θ和自适应特征融合参数β。
进一步的,所述编码器中还包括利用重构损失来更新自编码器、卷积自编码器、卷积变分自编码器的网络参数;具体为将编码器输入原始数据x i和解码器输出重构数据
Figure PCTCN2021131248-appb-000009
的平方差函数作为重构损失,预训练编码器,得到初始化模型,表示为:
Figure PCTCN2021131248-appb-000010
其中,L R表示重构损失函数。
进一步的,所述深度嵌入式聚类层中还包括利用聚类损失KL散度来更新聚类结果、编码器参数和融合参数;具体为:
使用学生t分布作为核函数计算特征点Z i和聚类中心μ j相似度,表示为:
Figure PCTCN2021131248-appb-000011
其中,Z i=∫(h(x i))∈Z;α表示学生t分布的自由度;q ij表示将样本i分配给聚类中心μ j的概率;μ j′表示每一个中心点;
通过在辅助目标分布的帮助下从聚类的高可信度分配中学习来迭代地优化聚类,即通过将软分配与目标分布匹配来训练模型;将目标损失函数定义为软分配概率q i和辅助分布p i之间的KL散度,表示为:
Figure PCTCN2021131248-appb-000012
Figure PCTCN2021131248-appb-000013
f j=∑ iq ij
其中,L C表示聚类损失函数;f j=∑ iq ij表示软聚类频率。
进一步的,所述深度嵌入式聚类层还包括:
通过带动量的随机梯度下降算法联合优化聚类中心μ j,网络参数θ和自适应特征融合参数β,关于每个数据点Z i和每个聚类中心μ j的特征空间嵌入的L梯度计算如下:
Figure PCTCN2021131248-appb-000014
Figure PCTCN2021131248-appb-000015
其中,梯度
Figure PCTCN2021131248-appb-000016
通过反向传播来计算网络参数梯度
Figure PCTCN2021131248-appb-000017
当连续两次迭代之间更改聚类分配的点数小于总点数的预设比例时,则停止聚类。
相应的,还提供基于自编码器的多模态自适应融合深度聚类方法,包括:
S1.使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器非线性的映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
S2.将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m以自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;
S3.使用与编码器对称的结构对聚类后的融合特征Z进行解码,得到解码后的数据集
Figure PCTCN2021131248-appb-000018
S4.对自适应融合特征Z进行聚类,通过对比聚类结果与真实标签得到最终准确率ACC。
进一步的,所述步骤S2中得到融合特征Z,表示为:
Z=ω 1·Z 12·Z 23·Z 3
其中,ω m表示第m个模态的特征的重要性权重,由网络自适应的学习,得到自适应特征融合参数;
限制
Figure PCTCN2021131248-appb-000019
并定义:
Figure PCTCN2021131248-appb-000020
其中,ω m分别通过使用β m作为控制参数的softmax函数定义;在不同模态特征上分别使用1×1卷积计算权重标量β m,通过标准反向传播来学习。
与现有技术相比,本申请提出了一种新颖的多模态自适应特征融合深度聚类框架,该框架包括多模态编码器,自适应融合网络和深度聚类层。通过多模态编码器和多模态自适应特征融合层,模型通过非线性映射提取原始数据特征,实现高维数据降维,优化数据特征公共子空间,最后用KL散度约束子空间聚类。在三个公共数据集上的实验结果证明了我们的模型优于几种最新的模型。
附图说明
图1是实施例一提供的基于自编码器的多模态自适应融合深度聚类模型结构图;
图2是实施例一提供的基于自动编码器的多模态深度聚类(MDEC)结构示意图;
图3是实施例二提供的数据集具体信息和样本信息示意图;
图4是实施例三提供的基于自编码器的多模态自适应融合深度聚类方法示意图。
具体实施方式
以下通过特定的具体实例说明本申请的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本申请的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
本申请的目的是针对现有技术的缺陷,提供了基于自编码器的多模态自适应融合深度聚类模型及方法。
实施例一
本实施例提供基于自编码器的多模态自适应融合深度聚类模型,如图1所示,包括编码器11、多模态自适应融合层12、解码器13、深度嵌入式聚类层14;编码器11包括自编码器、卷积自编码器、卷积变分自编码器;
编码器11,用于使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器非线性的映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
多模态自适应融合层12,与编码器11连接,用于将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m通过自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;
解码器13,与多模态自适应特征融合12连接,用于使用与编码器对称的结构对聚类后的融合特征Z进行解码,得到解码后的重构数据集
Figure PCTCN2021131248-appb-000021
深度嵌入式聚类层14,与多模态自适应融合层12连接,用于对融合特征Z进行聚类,得到聚类后的融合特征Z;
如图2所示为基于自动编码器的多模态自适应特征融合深度聚类(MDEC)结构示意图,该结构由四部分组成:自编码器、卷积自编码器、卷积变分自编码器组成的编码器11;多模态自适应融合层12;深度嵌入式聚类层13;解码器14。
在编码器11中,使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器非线性的映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
具体为:在模型中,使用X表示数据集,分别通过自编码器,卷积自编码器,变分自编码器的非线性的映射h(X;θ m)获得潜在的特征Z m。通过编码器可以将高维的数据转换成低维特征,表达式如下:
Z m=h(X;θ m)
其中,θ m表示编码器模型参数;m表示编码器序列。
在多模态自适应融合层12中,将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m通过自适应空间特征融合方式融合到公共子空间中,得到融合特征Z。
具体为:通过编码器层的映射后,得到了三个潜在特征空间Z m,为了获取原始数据更全面的信息,我们将不同自编码器获取的不同特征Z m融合到公共子空间Z,公式如下:
Z=ω 1·Z 12·Z 23·Z 3
其中,ω m表示第m个模态的特征的重要性权重,由网络自适应的学习,得到自适应特征融合参数;
限制
Figure PCTCN2021131248-appb-000022
并定义:
Figure PCTCN2021131248-appb-000023
其中,ω m分别通过使用β m作为控制参数的softmax函数定义;在不同模态特征上分别使用1×1卷积计算权重标量β m,通过标准反向传播来学习。
在解码器13中,使用与编码器对称的结构对聚类后的融合特征Z进行解码,得到解码后的数据集。
具体为:为了能更好的学习到原始数据X的特征Z,我们使用与编码器对称的结构解码:
Figure PCTCN2021131248-appb-000024
其中,
Figure PCTCN2021131248-appb-000025
表示数据集X的重建;
Figure PCTCN2021131248-appb-000026
表示解码器模型参数。
在深度嵌入式聚类层14中,对融合特征Z进行聚类,通过对比聚类结果与真实标签得到最终准确率ACC。
具体为:在聚类层借鉴了DEC“J.Xie,R.Girshick,and A.Farhadi,“Unsupervised deep embedding for clustering analysis,”in Proc.Int.Conf.Mach.Learn.,2016,pp.478–487”的思想,将
Figure PCTCN2021131248-appb-000027
分成k个类,每个类的中心用μ j,j=1,...,k表示表示。对融合特征Z进行聚类,首先初始化聚类中心
Figure PCTCN2021131248-appb-000028
之后计算特征点与聚类中心的软分配,计算软分配和辅助分布的KL散度来更新聚类中心μ j、参数θ和β。
在本实施例中,还包括损失函数。
损失函数由两部分组成:(1)重构损失L R用来更新编码器,卷积自编码器,卷积变分自编码器网络参数。(2)聚类损失L C用来更新聚类结果和自编码器参数和自适应融合参数。
重构损失
模型将编码器输入和解码器输出的平方差函数作为重构损失,预训练自编码器,得到一个好的初始化模型:
Figure PCTCN2021131248-appb-000029
其中,L R表示重构损失函数。
聚类损失
根据文献“van der Maaten,Laurens and Hinton,Geoffrey.Visualizing data using t-SNE.JMLR,2008”,我们使用学生t分布作为核函数计算特征点Z i和聚类中心μ j
Figure PCTCN2021131248-appb-000030
其中Z i=∫(h(x i));α表示学生t分布的自由度;q ij可以解释为将样本i分配给聚类中心j的概率;μ j′表示每一个中心点;我们通过在辅助目标分布的帮助下从聚类的高可信度分配中学习来迭代地优化聚类,即通过将软分配与目标分布匹配来训练我们的模型。将目标损失函数定义为软分配概率q ij和辅助分布p ij之间的KL散度,表示为:
Figure PCTCN2021131248-appb-000031
其中,L C表示聚类损失函数;q ij表示样本i属于j类的概率;p ij表示样本i属于j类的目标概率;
通过首先将q i提高到第二次幂,然后通过每个聚类的频率归一化来计算p i,表示为:
Figure PCTCN2021131248-appb-000032
f j=∑ iq ij
把训练分为两个阶段,分别是预训练初始化阶段和聚类优化阶段。在预训练初始化阶段,使用下面的损失函数训练模型:
L 1=L R
在聚类优化阶段使用损失函数,表示为:
L 2=L R+L C
进行聚类时还包括对函数进行优化,具体为:
通过带动量的随机梯度下降算法联合优化聚类中心{μ j}和网络参数θ,关于每个数据点Z i和每个类质心μ j的特征空间嵌入的L梯度计算如下:
Figure PCTCN2021131248-appb-000033
Figure PCTCN2021131248-appb-000034
梯度
Figure PCTCN2021131248-appb-000035
通过反向传播来计算网络参数梯度
Figure PCTCN2021131248-appb-000036
为了发现聚类分配,当连续两次迭代之间更改聚类分配的点数小于总点数的一定比例时,则停止聚类。
本实施例通过不同的编码器提取不同的潜在特征,并将特征融合到公共子空间中。经过预训练后,我们得到初始化的自适应特征融合参数β和模型参数θ m,然后对融合后的公共子空间Z执行K均值聚类初始化聚类中心μ j
实施例二
本实施提供的基于自编码器的多模态自适应融合深度聚类模型与实施例一的不同之处在于:
以多个数据集上验证了本实施例提出的模型,并与多个优秀的方法进行比较。
数据集:
MNIST:MNIST数据集由70000个手写数字组成,大小为28x 28像素。这些数字已经居中并进行尺寸规格化,如文献“LeCun,Yann,Bottou,Le′on, Bengio,Yoshua,and Haffner,Patrick.Gradient-based learning applied to document recognition.Proceedings of the IEEE,86(11):2278–2324,1998”。
FASHION-MNIST:包含来自20个类别的7万个时尚产品图片,并且图片大小与MNIST相同,如文献“Xiao,H.;Rasul,K.;and Vollgraf,R.2017.Fashion-mnist:a novel image dataset for benchmarking machine learning algo-rithms.arXiv preprint arXiv:1708.07747”。
COIL-20:收集从不同角度观看的20种类别的1440 128×128灰度对象图像,如文献“Li,F.;Qiao,H.;and Zhang,B.2018.Discriminatively boosted image clustering with fully convolutional auto-encoders.PR 83:161–173”。
数据集具体信息和样本查看表1和图3。
数据集 数量 类别 图像尺寸
MNIST 70000 10 (28,28,1)
FASHION-MNIST 70000 10 (28,28,1)
USPS 9298 10 (16,16,1)
COIL20 1440 20 (128,128,1)
表1数据集信息
评估指标
使用标准的无监督评估指标和协议对其他算法进行评估和比较。对于所有算法,将聚类的数量设置为真实类别的数量,并使用无监督的聚类精度(ACC)评估性能:
Figure PCTCN2021131248-appb-000037
其中,l i是真实标签,C i是算法产生的聚类分配,m涵盖了聚类和标签之间所有可能的一对一映射。
该指标直观地从无监督算法和基本事实分配中获取聚类分配,然后找到它们之间的最佳匹配。“Kuhn,Harold W.The hungarian method for the assignment  problem.Naval research logistics quarterly,2(1-2):83–97,1955”可以有效地计算出最佳映射。
网络配置
采用自编码器,卷积自编码器和卷积变分自动编码器作为用于原始图像的三个单模态深层网络分支,具体网络配置见表2。
Figure PCTCN2021131248-appb-000038
表2网络分支结构
算法比较(纵向)
Figure PCTCN2021131248-appb-000039
表3三种数据集上不同算法的聚类性能纵向比较算法比较(横向)
Figure PCTCN2021131248-appb-000040
表4三种数据集上不同算法的聚类性能横向比较
选择两种单模态聚类方法:K-means,如“J.A.HartiganandM.A.Wong,“AlgorithmAS136:Ak-meansclustering algorithm,”J.Roy.Stat.Soc.C,Appl.Stat.,vol.28,no.1,pp.100–108,1979”,深度嵌入聚类(DEC),如“J.Xie,R.Girshick,and A.Farhadi,“Unsupervised deep embedding for clustering analysis,”in Proc.Int.Conf.Mach.Learn.,2016,pp.478–487”;传统的大型多模态聚类方法:稳健的多模态K均值聚类(RMKMC),如“Cai,X.;Nie,F.;and Huang,H.2013.Multi-view k-means clustering on big data.In IJCAI”;两种深层的两模式聚类方法:深度规范相关分析(DCCA),如“Andrew,G.;Arora,R.;Bilmes,J.;and Livescu,K.2013.Deep canonical correlation analysis.In ICML,1247–1255”,深度规范相关自动编码器(DCCAE),如“Wang,W.;Arora,R.;Livescu,K.;and Bilmes,J.2016.On deep multi-view representation learning:objectives and opti-mization.arXiv preprint arXiv:1602.01024”;两种深度多模态聚类方法:深度广义典范相关分析(DGCCA),如“Benton,A.;Khayrallah,H.;Gujral,B.;Reisinger,D.A.;Zhang,S.;and Arora,R.2017.Deep generalized  canonical correlation analysis.arXiv preprint arXiv:1702.02519”,深度多模态聚类(DMJC)的联合框架;Deep multimodal sub-space clustering networks.IEEE Journal of Selected Topics in Signal Processing 12(6):1601–1614。作为与本实施例提出的算法比较见表3,本实施例提出的方法也与论文Multi-View Deep Clustering based on AutoEncoder(MDEC)提出的方法做了比较,MDEC使用多视图的线性融合方法对三种视图融合,线性融合方法简单有效,但是无法有效约束三种不同视图特征的权重;而本实施提出的多模态自适应融合通过卷积和softmax函数获得融合参数,并且能通过反向传播来调整每个模态特征的权重,有效的提高了分类准确率。
本实施提出了一种新颖的多模态自适应特征融合深度聚类框架,该框架包括多模态编码器,自适应特征融合网络和深度聚类层。通过多模态编码器和和自适应特征融合层,模型通过非线性映射提取原始数据特征,高维数据降维,优化数据特征公共子空间,最后用KL散度约束子空间聚类。在三个公共数据集上的实验结果证明了本实施例的模型优于几种最新的模型。
实施例三
本实施例提供基于自编码器的多模态自适应融合深度聚类方法,如图4所示,包括:
S11.使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器非线性的映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
S12.将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m通过自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;
S13.使用与编码器对称的结构对聚类后的融合特征Z进行解码,得到解码后的重构数据集
Figure PCTCN2021131248-appb-000041
S14.对融合特征Z进行聚类,对聚类结果与标签结果对比得到最终准确率ACC。
需要说明的是,本实施例提供的一种基于自动编码器的多模态自适应特征融合深度聚类方法与实施例一类似,在此不多做赘述。
与现有技术相比,本实施例提出了一种新颖的多模态自适应融合深度聚类框架,该框架包括多模态编码器,多模态自适应特征融合网络和深度聚类层。通过多模态编码器和和融合层,模型通过非线性映射提取原始数据特征,高维数据降维,优化数据特征公共子空间,最后用KL散度约束子空间聚类。在三个公共数据集上的实验结果证明了本实施例的模型优于几种最新的模型。
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。

Claims (10)

  1. 基于自编码器的多模态自适应融合深度聚类模型,其特征在于,包括编码器、多模态自适应融合层、解码器、深度嵌入式聚类层;所述编码器包括自编码器、卷积自编码器、卷积变分自编码器;
    编码器,用于使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器三种非线性映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
    多模态自适应融合层,与所述编码器连接,用于将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m通过自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;
    解码器,与所述多模态自适应融合层连接,用于使用与编码器对称的结构对融合后的特征Z进行解码,得到解码后的重构数据集
    Figure PCTCN2021131248-appb-100001
    深度嵌入式聚类层,与所述多模态自适应融合层连接,用于对融合特征Z进行聚类,通过对比聚类结果与真实标签得到最终准确率ACC。
  2. 根据权利要求1所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述编码器中分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m,表示为:
    Z m=h(X;θ m)
    其中,θ m表示编码器模型参数;m表示编码器序列,取值范围为{1,2,3}。
  3. 根据权利要求2所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述多模态自适应融合层中得到的融合特征Z,表示为:
    Z=ω 1·Z 12·Z 23·Z 3
    其中,ω m表示第m个模态的特征的重要性权重,由网络自适应的学习,得到自适应特征融合参数;
    限制
    Figure PCTCN2021131248-appb-100002
    并定义:
    Figure PCTCN2021131248-appb-100003
    其中,ω m分别通过使用β m作为控制参数的softmax函数定义;在不同模态特征上分别使用1×1卷积计算权重标量β m,通过标准反向传播来学习。
  4. 根据权利要求3所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述解码器中得到解码后的重构数据集
    Figure PCTCN2021131248-appb-100004
    表示为:
    Figure PCTCN2021131248-appb-100005
    其中,
    Figure PCTCN2021131248-appb-100006
    表示解码器模型参数。
  5. 根据权利要求4所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述深度嵌入式聚类层中对融合特征Z进行聚类具体为:
    将n个点
    Figure PCTCN2021131248-appb-100007
    分成k个类,每个类的中心用μ j,j=1,...,k,初始化聚类中心
    Figure PCTCN2021131248-appb-100008
    并计算特征点与聚类中心的软分配q ij和辅助分布p i,最终用软分配q ij和辅助分布p i的KL散度来定义聚类损失函数,并更新聚类中心μ j、编码器、解码器参数θ和自适应特征融合参数β。
  6. 根据权利要求5所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述编码器中还包括利用重构损失来更新自编码器、卷积自编码器、卷积变分自编码器的网络参数;具体为将编码器输入原始数据x i和解码器输出重构数据
    Figure PCTCN2021131248-appb-100009
    的平方差函数作为重构损失,预训练编码器,得到初始化模型,表示为:
    Figure PCTCN2021131248-appb-100010
    其中,L R表示重构损失函数。
  7. 根据权利要求6所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述深度嵌入式聚类层中还包括利用聚类损失KL散度来更新聚类结果、编码器参数和融合参数;具体为:
    使用学生t分布作为核函数计算特征点Z i和聚类中心μ j相似度,表示为:
    Figure PCTCN2021131248-appb-100011
    其中,Z i=∫(h(x i))∈Z;α表示学生t分布的自由度;q ij表示将样本i分配给聚类中心μ j的概率;μ j′表示每一个中心点;
    通过在辅助目标分布的帮助下从聚类的高可信度分配中学习来迭代地优化聚类,即通过将软分配与目标分布匹配来训练模型;将目标损失函数定义为软分配概率q i和辅助分布p i之间的KL散度,表示为:
    Figure PCTCN2021131248-appb-100012
    Figure PCTCN2021131248-appb-100013
    f j=∑ iq ij
    其中,L C表示聚类损失函;f j=∑ iq ij表示软聚类频率。
  8. 根据权利要求7所述的基于自编码器的多模态自适应融合深度聚类模型,其特征在于,所述深度嵌入式聚类层还包括:
    通过带动量的随机梯度下降算法联合优化聚类中心μ j,网络参数θ和自适应特征融合参数β,关于每个数据点Z i和每个聚类中心μ j的特征空间嵌入的L梯度计算如下:
    Figure PCTCN2021131248-appb-100014
    Figure PCTCN2021131248-appb-100015
    其中,梯度
    Figure PCTCN2021131248-appb-100016
    通过反向传播来计算网络参数梯度
    Figure PCTCN2021131248-appb-100017
    当连续两次迭代之间更改聚类分配的点数小于总点数的预设比例时,则停止聚类。
  9. 基于自编码器的多模态自适应融合深度聚类方法,其特征在于,包括:
    S1.使数据集X分别通过自编码器、卷积自编码器、卷积变分自编码器非线性的映射h(X;θ m),分别得到自编码器、卷积自编码器、卷积变分自编码器的潜在特征Z m
    S2.将自编码器、卷积自编码器、卷积变分自编码器分别得到的潜在特征Z m以自适应空间特征融合方式融合到公共子空间中,得到融合特征Z;
    S3.使用与编码器对称的结构对聚类后的融合特征Z进行解码,得到解码后的数据集
    Figure PCTCN2021131248-appb-100018
    S4.对自适应融合特征Z进行聚类,通过对比聚类结果与真实标签得到最终准确率ACC。
  10. 根据权利要求9所述的基于自编码器的多模态自适应融合深度聚类方法,其特征在于,所述步骤S2中得到融合特征Z,表示为:
    Z=ω 1·Z 12·Z 23·Z 3
    其中,ω m表示第m个模态的特征的重要性权重,由网络自适应的学习,得到自适应特征融合参数;
    限制
    Figure PCTCN2021131248-appb-100019
    并定义:
    Figure PCTCN2021131248-appb-100020
    其中,ω m分别通过使用β m作为控制参数的softmax函数定义;在不同模态特征上分别使用1×1卷积计算权重标量β m,通过标准反向传播来学习。
PCT/CN2021/131248 2021-01-25 2021-11-17 基于自编码器的多模态自适应融合深度聚类模型及方法 WO2022156333A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/273,783 US20240095501A1 (en) 2021-01-25 2021-11-17 Multi-modal adaptive fusion deep clustering model and method based on auto-encoder
ZA2022/07739A ZA202207739B (en) 2021-01-25 2022-07-12 Autoencoder based multimodal adaptive fusion in-depth clustering model and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110096080.5 2021-01-25
CN202110096080.5A CN112884010A (zh) 2021-01-25 2021-01-25 基于自编码器的多模态自适应融合深度聚类模型及方法

Publications (1)

Publication Number Publication Date
WO2022156333A1 true WO2022156333A1 (zh) 2022-07-28

Family

ID=76050922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131248 WO2022156333A1 (zh) 2021-01-25 2021-11-17 基于自编码器的多模态自适应融合深度聚类模型及方法

Country Status (5)

Country Link
US (1) US20240095501A1 (zh)
CN (1) CN112884010A (zh)
LU (1) LU502834B1 (zh)
WO (1) WO2022156333A1 (zh)
ZA (1) ZA202207739B (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186358A (zh) * 2023-02-07 2023-05-30 和智信(山东)大数据科技有限公司 一种深度轨迹聚类方法、系统及存储介质
CN116456183A (zh) * 2023-04-20 2023-07-18 北京大学 一种事件相机引导下的高动态范围视频生成方法及系统
CN117170246A (zh) * 2023-10-20 2023-12-05 达州市经济发展研究院(达州市万达开统筹发展研究院) 一种用于水轮机流体量的自适应控制方法及系统
CN117292442A (zh) * 2023-10-13 2023-12-26 中国科学技术大学先进技术研究院 一种跨模态跨域通用人脸伪造定位方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884010A (zh) * 2021-01-25 2021-06-01 浙江师范大学 基于自编码器的多模态自适应融合深度聚类模型及方法
CN113780395B (zh) * 2021-08-31 2023-02-03 西南电子技术研究所(中国电子科技集团公司第十研究所) 海量高维ais轨迹数据聚类方法
CN113627151B (zh) * 2021-10-14 2022-02-22 北京中科闻歌科技股份有限公司 跨模态数据的匹配方法、装置、设备及介质
CN114187969A (zh) * 2021-11-19 2022-03-15 厦门大学 一种处理单细胞多模态组学数据的深度学习方法及系统
CN114548367B (zh) * 2022-01-17 2024-02-20 中国人民解放军国防科技大学 基于对抗网络的多模态数据的重构方法及装置
CN114999637B (zh) * 2022-07-18 2022-10-25 华东交通大学 多角度编码与嵌入式互学习的病理图像诊断方法与系统
CN116206624B (zh) * 2023-05-04 2023-08-29 科大讯飞(苏州)科技有限公司 一种车辆声浪合成方法、装置、存储介质及设备
CN116738297B (zh) * 2023-08-15 2023-11-21 北京快舒尔医疗技术有限公司 一种基于深度自编码的糖尿病分型方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629374A (zh) * 2018-05-08 2018-10-09 深圳市唯特视科技有限公司 一种基于卷积神经网络的无监督多模态子空间聚类方法
CN109389166A (zh) * 2018-09-29 2019-02-26 聚时科技(上海)有限公司 基于局部结构保存的深度迁移嵌入聚类机器学习方法
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
CN112884010A (zh) * 2021-01-25 2021-06-01 浙江师范大学 基于自编码器的多模态自适应融合深度聚类模型及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
CN108629374A (zh) * 2018-05-08 2018-10-09 深圳市唯特视科技有限公司 一种基于卷积神经网络的无监督多模态子空间聚类方法
CN109389166A (zh) * 2018-09-29 2019-02-26 聚时科技(上海)有限公司 基于局部结构保存的深度迁移嵌入聚类机器学习方法
CN112884010A (zh) * 2021-01-25 2021-06-01 浙江师范大学 基于自编码器的多模态自适应融合深度聚类模型及方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186358A (zh) * 2023-02-07 2023-05-30 和智信(山东)大数据科技有限公司 一种深度轨迹聚类方法、系统及存储介质
CN116186358B (zh) * 2023-02-07 2023-08-15 和智信(山东)大数据科技有限公司 一种深度轨迹聚类方法、系统及存储介质
CN116456183A (zh) * 2023-04-20 2023-07-18 北京大学 一种事件相机引导下的高动态范围视频生成方法及系统
CN116456183B (zh) * 2023-04-20 2023-09-26 北京大学 一种事件相机引导下的高动态范围视频生成方法及系统
CN117292442A (zh) * 2023-10-13 2023-12-26 中国科学技术大学先进技术研究院 一种跨模态跨域通用人脸伪造定位方法
CN117292442B (zh) * 2023-10-13 2024-03-26 中国科学技术大学先进技术研究院 一种跨模态跨域通用人脸伪造定位方法
CN117170246A (zh) * 2023-10-20 2023-12-05 达州市经济发展研究院(达州市万达开统筹发展研究院) 一种用于水轮机流体量的自适应控制方法及系统

Also Published As

Publication number Publication date
US20240095501A1 (en) 2024-03-21
ZA202207739B (en) 2022-07-27
LU502834B1 (en) 2023-01-26
CN112884010A (zh) 2021-06-01

Similar Documents

Publication Publication Date Title
WO2022156333A1 (zh) 基于自编码器的多模态自适应融合深度聚类模型及方法
Li et al. Deep adversarial multi-view clustering network.
Yang et al. Towards k-means-friendly spaces: Simultaneous deep learning and clustering
Stewart et al. End-to-end people detection in crowded scenes
Singh et al. Svm-bdt pnn and fourier moment technique for classification of leaf shape
Xie et al. Unsupervised deep embedding for clustering analysis
Tao et al. Unsupervised spectral–spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification
JP5254893B2 (ja) 画像変換方法及び装置並びにパターン識別方法及び装置
Sankaran et al. Group sparse autoencoder
CN112446423B (zh) 一种基于迁移学习的快速混合高阶注意力域对抗网络的方法
Bi et al. A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends
Chu et al. Stacked Similarity-Aware Autoencoders.
CN110188827A (zh) 一种基于卷积神经网络和递归自动编码器模型的场景识别方法
Dahal Learning embedding space for clustering from deep representations
Wang et al. Generative partial multi-view clustering
Khodadoust et al. Partial fingerprint identification for large databases
Nanda et al. A person re-identification framework by inlier-set group modeling for video surveillance
Ngadi et al. Uniformed two local binary pattern combined with neighboring support vector classifier for classification
Raikar et al. Efficiency comparison of supervised and unsupervised classifier on content based classification using shape, color, texture
Clément et al. Bags of spatial relations and shapes features for structural object description
Chen et al. Learning discriminative feature via a generic auxiliary distribution for unsupervised domain adaptation
Zhong et al. Heterogeneous visual features integration for image recognition optimization in internet of things
Sener et al. Unsupervised transductive domain adaptation
Du et al. Deep neural networks with parallel autoencoders for learning pairwise relations: Handwritten digits subtraction
Chen et al. D-trace: deep triply-aligned clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920709

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18273783

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920709

Country of ref document: EP

Kind code of ref document: A1