CN116204804A - Multi-view clustering method and device, electronic equipment and storage medium - Google Patents

Multi-view clustering method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116204804A
CN116204804A CN202310138483.0A CN202310138483A CN116204804A CN 116204804 A CN116204804 A CN 116204804A CN 202310138483 A CN202310138483 A CN 202310138483A CN 116204804 A CN116204804 A CN 116204804A
Authority
CN
China
Prior art keywords
view
sample data
cluster
clustering
probability distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310138483.0A
Other languages
Chinese (zh)
Inventor
钱胜胜
徐常胜
薛迪展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202310138483.0A priority Critical patent/CN116204804A/en
Publication of CN116204804A publication Critical patent/CN116204804A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Abstract

The invention provides a multi-view clustering method, a device, electronic equipment and a storage medium, and relates to the technical field of multimedia, wherein the method comprises the following steps: acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data; respectively inputting the data into encoders corresponding to multiple views to obtain the coding characteristics of sample data of the multiple views; constructing a depth variation reasoning network of the dirichlet procedure Gaussian mixture model, wherein the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model; inputting the coding features of the sample data of the multiple views into a depth variation reasoning network, and performing unsupervised clustering of the unknown cluster numbers to obtain the cluster numbers of the cluster clusters and the prediction cluster labels of the samples in each view. The invention can perform multi-view clustering under the condition of unknown clustering number, and automatically find the clustering number in the clustering process.

Description

Multi-view clustering method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of multimedia technologies, and in particular, to a multi-view clustering method, a device, an electronic apparatus, and a storage medium.
Background
With the rapid development of the internet and multimedia, multi-view data has been explosively increased. The multi-view clustering is used as a basic task of multi-view learning, and aims to mine information complementary characteristics among different views so as to improve data clustering performance.
In the existing multi-view clustering scheme, the depth-dependent clustering number is priori information to improve the multi-view clustering effect. However, in a scene with more realistic significance, the clustering number of multi-view data to be clustered is often not known, so how to perform multi-view clustering under the condition of unknown clustering number and automatically discover the clustering number in the clustering process becomes a great challenge.
Disclosure of Invention
The invention provides a multi-view clustering method, a device, electronic equipment and a storage medium, which are used for solving the defect that multi-view clustering cannot be carried out under the condition of unknown clustering number and the clustering number is automatically found in the clustering process in the prior art, and realizing multi-view clustering under the condition of unknown clustering number and the clustering number is automatically found in the clustering process.
The invention provides a multi-view clustering method, which comprises the following steps:
acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
Respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;
a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;
inputting the coding features of the multi-view sample data into the depth variation reasoning network, and performing unsupervised clustering of the unknown clusters to obtain the cluster numbers of the clusters and the predictive cluster labels of the samples in each view.
According to the multi-view clustering method provided by the invention, the depth variation reasoning network for constructing the dirichlet process Gaussian mixture model comprises the following steps:
determining a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;
determining the prior probability distribution of the dirichlet procedure Gaussian mixture model based on the joint distribution probability of the variation parameters and the features to be classified in the second distribution under the prior super parameters of the dirichlet procedure Gaussian mixture model;
And constructing a depth variation inference network of the Dirichlet process Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.
According to the multi-view clustering method provided by the invention, the coding features of the sample data of the multi-view are input into the depth variation inference network, and unsupervised clustering of unknown clusters is performed to obtain the number of clusters of the clusters and the predictive cluster labels of the samples in each view, and the method comprises the following steps:
taking the coding features of the multi-view sample data as the features to be classified, and inputting the features into the depth variation reasoning network;
converting the first minimization function into a second minimization function of variation reasoning loss between mathematical expectations corresponding to the heavy parameter variation probability distribution and mathematical expectations corresponding to the prior probability distribution;
and solving the second minimization function to obtain the clustering number of the clustering clusters and the prediction clustering labels of the samples in each view angle.
According to the multi-view clustering method provided by the invention, the method further comprises the following steps:
determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on predictive cluster labels of all samples in all view angles;
Based on the first accumulation function and the second accumulation function, performing cross-view dual-anchor contrast learning to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;
and aiming at the target sample data in each cluster, carrying out average pooling on the prediction probability of the prediction cluster labels after the target sample data of all view angles in the cluster are aligned, and obtaining the robust labels of the target sample data.
According to the multi-view clustering method provided by the invention, the first accumulation function of the cross-view sample anchor point contrast loss function of all view angles is determined based on the predictive clustering labels of all samples in all view angles, and the method comprises the following steps:
determining a cross-view sample anchor point contrast loss function between a predicted cluster tag sequence of each sample in a first view and a predicted cluster tag sequence of each sample in a second view with a sample anchor point for the first view;
a first accumulation function of cross-view sample anchor point contrast loss functions for all views is determined.
According to the multi-view clustering method provided by the invention, the second accumulation function of the cross-view cluster anchor point contrast loss function of all view angles is determined based on the prediction clustering labels of all samples in all view angles, and the method comprises the following steps:
Determining a cross-view cluster anchor point contrast loss function between a predicted cluster label sequence of each sample in a first view and a predicted cluster label sequence of each sample in a second view with a cluster anchor point for the first view;
a second accumulation function of cross-view cluster anchor point contrast loss functions for all views is determined.
According to the multi-view clustering method provided by the invention, the method further comprises the following steps:
inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data;
constructing an unsupervised reconstruction loss function based on original features and reconstruction features of the multi-view sample data;
and adjusting parameters of the encoder corresponding to the multiple views based on the unsupervised reconstruction loss function.
The invention also provides a multi-view clustering device, which comprises:
the extraction module is used for acquiring multi-view sample data, inputting the multi-view sample data into the feature extraction network and obtaining original features of the multi-view sample data;
the encoding module is used for inputting original features of the multi-view sample data into encoders corresponding to the multi-view respectively to obtain encoding features of the multi-view sample data;
The construction module is used for constructing a depth variation reasoning network of the dirichlet process Gaussian mixture model, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet process Gaussian mixture model;
and the clustering module is used for inputting the coding features of the sample data of the multiple views into the depth variation reasoning network, and performing unsupervised clustering of the unknown cluster number to obtain the cluster number of the cluster and the prediction cluster labels of the samples in each view.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the multi-view clustering method as described in any one of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the multi-view clustering method as described in any one of the above.
According to the multi-view clustering method, the device, the electronic equipment and the storage medium, firstly, multi-view sample data are obtained, and the multi-view sample data are input into a feature extraction network to obtain original features of the multi-view sample data; then, respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-view to obtain coding features of the multi-view sample data, and mapping the original features of the multi-view sample data into a common semantic representation space; then, a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, wherein the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model; finally, inputting the coding features of the sample data of the multiple views into a depth variation reasoning network, and performing unsupervised clustering of unknown clusters by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Gaussian mixture model in the Dirichlet process to obtain the cluster number of the clusters and the prediction cluster labels of all samples in each view. Therefore, the invention can perform multi-view clustering under the condition of unknown clustering number, and automatically find the clustering number in the clustering process.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a multi-view clustering method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a model framework of a multi-view clustering method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of cross-view dual anchor contrast learning provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a multi-view clustering device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The multi-view clustering method of the present invention is described below with reference to fig. 1 to 3.
Referring to fig. 1, fig. 1 is a flow chart of a multi-view clustering method according to an embodiment of the invention. As shown in fig. 1, the method may include the steps of:
step 101, acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
102, respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;
step 103, constructing a depth variation inference network of the dirichlet procedure Gaussian mixture model, wherein the depth variation inference network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;
and 104, inputting the coding features of the sample data of the multiple views into the depth variation inference network, and performing unsupervised clustering of the unknown cluster numbers to obtain the cluster numbers of the cluster clusters and the prediction cluster labels of the samples in each view.
In step 101, as shown in fig. 2, the multi-view may include: text, images, audio, etc. The sample data of the multiple views is sample data of a plurality of samples at different views, and if the samples are birds, the sample data of the birds at different views includes: text about a descriptive bird, photographed images of a bird, recorded bird call audio, etc.
The feature extraction network may be: a pre-trained deep learning network for feature extraction. Illustratively, the number of samples from multiple views will beAccording to the feature space mapped to different view angles by the feature extraction function in the pre-trained deep learning network, the original features of the sample data of multiple view angles are obtained
Figure BDA0004086794580000061
m=1, …, M represents the number of viewing angles. Wherein the original feature of the sample data of the mth view angle +.>
Figure BDA0004086794580000062
N represents the number of samples.
In step 102, since the original features of the multi-view sample data contain redundancy and random noise, an encoder may be employed to project the original features of the multi-view sample data into a common space.
Illustratively, for the mth view, an encoder corresponding to the mth view is constructed
Figure BDA0004086794580000075
The encoder includes a multi-layer perceptron. Original feature X of sample data of mth view angle m Input encoder->
Figure BDA0004086794580000076
In which the encoder uses X m And projecting the sample data into the D-dimensional feature space to obtain the coding feature of the sample data of the mth view. Optionally, the encoder is an automatic encoder.
Specifically, the original feature of the nth sample at the mth viewing angle can be expressed by expression (1)
Figure BDA0004086794580000071
Projecting the coded features to a D-dimensional feature space to obtain the coded features of the nth sample at the mth view:
Figure BDA0004086794580000072
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004086794580000073
coding features representing the nth sample at the mth view,/->
Figure BDA0004086794580000077
Trainable parameters representing the encoder for the mth view, for example>
Figure BDA0004086794580000074
In step 103, in order to perform unsupervised clustering on coding features across view angles and overcome the difficulty that the prior distribution cannot be directly calculated by the dirichlet process Gaussian mixture model, a depth variation inference network of the dirichlet process Gaussian mixture model is constructed. The principle of the depth variation reasoning network is as follows: unsupervised clustering is performed by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Dirichlet process Gaussian mixture model.
Optionally, step 103 comprises the sub-steps of:
step 1031, determining a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;
step 1032, determining a priori probability distribution of the dirichlet procedure gaussian mixture model based on the joint distribution probability of the variation parameter and the feature to be classified in the second distribution under the priori super parameter of the dirichlet procedure gaussian mixture model;
and 1033, constructing a depth variation inference network of the dirichlet procedure Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.
In step 1031, q γ (w) represents the distribution probability of w in the distribution q, i.e., the heavy parameter variation probability distribution. Wherein γ represents the variation parameter, the distribution q represents the first distribution, w represents the weight parameter, w= { v, η * Z, v represents a class prior parameter, η * The gaussian mixture parameters are represented, and z represents the cluster label.
In step 1032, p (w, h|θ) represents that w and h are at θThe joint distribution probability in the distribution p is the prior probability distribution of the dirichlet process gaussian mixture model. Wherein h= { h 1 ,…,h N And (2) representing the sortable features of the N samples, wherein θ represents the prior super-parameters of the Gaussian mixture model of the Dirichlet process.
In step 1033, q is calculated by expression (2) γ KL distance between (w) and p (w, h|θ):
KL(q γ (w)||p(w|h,θ)=E q [log q γ (w)]-E q [logp(w,h|θ)]+logp(h|θ) (2)
wherein KL () represents a KL distance function, E q Representing mathematical expectations to satisfy q distribution, q γ The subscript is abbreviated as q, p (h|θ) represents the probability of a joint distribution of h in the distribution p at θ, and the bottom of the log function is e or 2.
Based on KL (q) γ A first minimization function of (w) iip (w|h, θ), a depth variation inference network of the dirichlet procedure gaussian mixture model can be constructed.
In step 104, the coding features of the sample data of multiple views are input into a depth variation inference network, and unsupervised clustering of unknown clusters can be performed by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Gaussian mixture model in the Dirichlet process, so as to obtain the clusters of the clusters and the predictive cluster labels of the samples in each view.
Optionally, step 104 comprises the sub-steps of:
step 1041, taking the coding features of the multi-view sample data as the features to be classified, and inputting the features into the depth variation reasoning network;
step 1042, converting the first minimization function into a second minimization function of the variation inference loss between the mathematical expectation corresponding to the heavy parameter variation probability distribution and the mathematical expectation corresponding to the prior probability distribution;
step 1043, solving the second minimization function to obtain the clustering number of the clustering clusters and the predictive clustering labels of the samples in each view angle.
In step 1041, the coding features of the multi-view sample data are substituted into the depth variation inference network as the features to be classified.
In step 1042, since h is a fixed feature, lovp (h|θ) is a constant, minimizing the KL distance is equivalent to minimizing the loss. Based on this, the first minimization function is converted to q γ (w) a second minimization function of the variable reasoning penalty between the mathematical expectations corresponding to p (w, h|θ).
Determining a variational inference loss function by expression (3):
Figure BDA0004086794580000091
wherein L is var Representing the variational reasoning loss, θ= { α, μ 0 C, a, b }, p (v|α) represents Beta (1, α) of the Beta distribution, α represents the Beta distribution parameter, p (η) *0 C, a, b) represents canonical gamma distribution normgamma (μ) 0 ,c,a,b),μ 0 Representing Gaussian prior central parameters, c representing Gaussian variance convergence, a and b representing gamma distribution parameters, gaussian mixture parameters
Figure BDA0004086794580000092
T represents the number of clusters, mu T Representing the Gaussian center parameter, Σ T Representing the Gaussian covariance parameter, p (z) n I v) represents the polynomial distribution Mult (pi (v)),/v +>
Figure BDA0004086794580000093
π i The ith parameter, v, representing a polynomial distribution i Class prior parameters representing the ith cluster, p (h n |z n* ) Representing Gaussian distribution->
Figure BDA0004086794580000094
Indicating z < th n A Gaussian center parameter, ">
Figure BDA0004086794580000095
Indicating z < th n Gao Sixie variance parameter, z n A cluster label representing the nth sample.
The minimization function of the variational reasoning loss function is the second minimization function.
In step 1043, the dirichlet procedure and T-piece multivariate gaussian distribution are parameterized using truncation techniques, where the a priori distribution parameters v= { v 1 ,…,v T-1 Gaussian mixture parameters
Figure BDA0004086794580000096
Figure BDA0004086794580000097
Therefore, the learnable parameters of the deep heavy parameter variation inference network of the dirichlet process gaussian mixture model can be gamma= { v 1 ,…,v T-11 ,…,μ T1 ,…,Σ T }。
Solving a second minimization function to obtain the clustering number T of the clustering cluster (namely the multi-element Gaussian distribution) and the predictive clustering labels of the samples in each view angle
Figure BDA0004086794580000098
According to the multi-view clustering method provided by the embodiment, firstly, multi-view sample data are obtained, and the multi-view sample data are input into a feature extraction network to obtain original features of the multi-view sample data; then, respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-view to obtain coding features of the multi-view sample data, and mapping the original features of the multi-view sample data into a common semantic representation space; then, a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, wherein the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model; finally, inputting the coding features of the sample data of the multiple views into a depth variation reasoning network, and performing unsupervised clustering of unknown clusters by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Gaussian mixture model in the Dirichlet process to obtain the cluster number of the clusters and the prediction cluster labels of all samples in each view. Therefore, the embodiment can perform multi-view clustering under the condition of unknown clusters, and automatically discover the clusters in the clustering process.
In an embodiment, the method further comprises:
step 105, determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on predictive cluster labels of samples in all view angles;
step 106, based on the first accumulation function and the second accumulation function, performing cross-view dual-anchor contrast learning to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;
and 107, carrying out average pooling on the prediction clustering labels after the target sample data of all view angles in each cluster are aligned according to the target sample data in each cluster, and obtaining the robust labels of the target sample data.
In step 105, as shown in FIG. 3, H is defined m Predictive cluster labels of (2) are
Figure BDA0004086794580000101
Wherein H is m Coding features of sample data representing the mth view,/->
Figure BDA0004086794580000102
Predictive cluster label representing nth sample at mth view angle, +.>
Figure BDA0004086794580000103
A cluster label representing the nth sample mth view angle.
1) With respect to the first accumulation function (sample anchor point)
Determining a cross-view sample anchor point contrast loss function between a predicted cluster tag sequence of each sample in a first view and a predicted cluster tag sequence of each sample in a second view with a sample anchor point for the first view; a first accumulation function of cross-view sample anchor point contrast loss functions for all views is determined.
Illustratively, since predictive cluster labels of the same sample data should be consistent between different views, the mth view can be determined by expression (4)
Figure BDA0004086794580000111
And the first +.f. of the view with sample anchor>
Figure BDA0004086794580000112
Cross-view sample anchor point contrast loss function between: />
Figure BDA0004086794580000113
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004086794580000114
representation->
Figure BDA0004086794580000115
And->
Figure BDA0004086794580000116
Cross-view sample anchor contrast loss function between +.>
Figure BDA0004086794580000117
Predictive cluster labels representing the nth sample at the first view, cos () representing cosine similarity, τ s Representing the temperature coefficient.
To align labels between all views, a first cumulative function of cross-view sample anchor contrast loss functions for all views can be determined by expression (5) to calculate contrast loss in all views:
Figure BDA0004086794580000118
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004086794580000119
a first accumulation function representing a cross-view sample anchor point versus loss function for all views.
2) With respect to the second accumulation function (cluster anchor point)
Determining a cross-view cluster anchor point contrast loss function between a predicted cluster label sequence of each sample in a first view and a predicted cluster label sequence of each sample in a second view with a cluster anchor point for the first view; a second accumulation function of cross-view cluster anchor point contrast loss functions for all views is determined.
Will be
Figure BDA00040867945800001110
Is denoted by (i)>
Figure BDA00040867945800001111
Wherein (1)>
Figure BDA00040867945800001112
Predictive cluster labels representing the i-th cluster of all N samples. Since the predictive cluster labels of the same cluster should also be identical between different views, the +.m. of the mth view can be determined by expression (6)>
Figure BDA0004086794580000121
And the first view with cluster anchor +.>
Figure BDA0004086794580000122
Is used for comparing loss functions with cross-view cluster anchor points:
Figure BDA0004086794580000123
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004086794580000124
representation->
Figure BDA0004086794580000125
And->
Figure BDA0004086794580000126
Cross-view cluster anchor point contrast loss function between tau c Representing the temperature coefficient.
To align labels between all views, a second cumulative function of cross-view cluster anchor contrast loss functions for all views can be determined by expression (7) to calculate contrast loss in all views:
Figure BDA0004086794580000127
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004086794580000128
a second accumulation function representing a cross view cluster anchor point versus loss function for all views.
In step 106, each view should contain common semantics, although sample data for different views may contain noise for a particular view. Therefore in
Figure BDA0004086794580000129
And performing cross-view dual-anchor contrast learning, and aligning the predictive cluster labels of the same sample data and the predictive cluster labels of the same cluster.
By optimizing
Figure BDA00040867945800001210
And->
Figure BDA00040867945800001211
Predictive cluster labels of the same sample data and predictive cluster labels of the same cluster can be aligned across view angles to construct a common semantic structure and eliminate Except for view specific noise in the tag space.
After aligning the predictive cluster labels in different perspectives, the predictive cluster labels may be averaged and pooled by expression (8) to obtain a robust label for the nth sample in step 107:
Figure BDA00040867945800001212
wherein y is n A robust tag representing the nth sample,
Figure BDA00040867945800001213
representing the predictive cluster labels of the nth sample in the ith cluster at the mth view angle.
In this embodiment, an end-to-end bayesian depth network (MC-DPGMM) combining cross-view contrast learning is constructed, which can efficiently cluster multi-view data under the condition of unknown clusters. In MC-DPGMM, common features of multiple views are learned by adopting a feature extraction network based on an encoder, and the common features are combined with cross-view dual-anchor contrast learning, so that predictive cluster labels of the same sample data and predictive cluster labels of the same cluster can be aligned across views, and heterogeneity of multi-view data can be overcome.
In an embodiment, the method further comprises: inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data; constructing an unsupervised reconstruction loss function based on original features and reconstruction features of the multi-view sample data; and adjusting parameters of the encoder corresponding to the multiple views based on the unsupervised reconstruction loss function.
In particular, the decoder D is constructed for the mth view m (·;φ m ) The decoder includes a multi-layer perceptron. By the expression (9) pair
Figure BDA0004086794580000131
And (4) reconstructing:
Figure BDA0004086794580000132
wherein phi is m Representing the trainable parameters of the mth decoder,
Figure BDA0004086794580000133
representing the reconstruction characteristics of the nth sample at the mth view angle,>
Figure BDA0004086794580000134
to preserve semantics in the input and avoid model collapse, the parameters of the encoder for multi-view correspondence may be adjusted using an unsupervised reconstruction loss function of the encoder:
Figure BDA0004086794580000135
wherein L is rec Represents an unsupervised reconstruction loss function, M represents the number of view angles, N represents the number of samples,
Figure BDA0004086794580000136
representing the original features of the nth sample at the mth view angle,/for the mth sample>
Figure BDA0004086794580000137
Representing the reconstructed characteristics of the nth sample at the mth view angle.
Based on learned characteristics
Figure BDA0004086794580000138
m=1, …, M, multi-view clustering is performed with unknown clusters, and common semantics are mined in all views to improve cluster quality.
In this embodiment, an unsupervised reconstruction penalty is utilized to avoid model crashes.
The following experiments prove the effect of the multi-view clustering method:
in order to evaluate the multi-view clustering method provided by the embodiment, a Caltech-2V multi-view dataset and a Caltech-3V multi-view dataset are adopted, wherein the Caltech-2V multi-view dataset comprises two views, and 1400 samples and 7 categories of each view are adopted; the Caltech-3V multiview dataset contains three views, 1400 samples each, 7 categories each.
Table 1 shows a comparison of the multi-view clustering method (MC-DPGMM) proposed in this example with other methods of the prior art, over the Caltech-2V multi-view dataset and the Caltech-3V multi-view dataset. Three evaluation indices may be used: ACC, NMI, ARI.
Table 1 comparison results of different multi-view clustering methods
Figure BDA0004086794580000141
Wherein RMSL is an interactive multi-layer subspace learning algorithm based on multi-view clustering, constructing a multi-layer subspace interactive representation linked with a potential representation layer H to hierarchically recover the data potential cluster structure. MVC-LFA proposes a multi-view feature clustering algorithm based on post fusion alignment maximization. COMIC improves clustering performance by maximizing the use of a pre-computed set of complementary perspectives. The IMVTSC-MVI combines feature space based missing view inference and manifold space based similarity graph learning into a unified framework, introducing low rank tensor constraints to capture the higher order correlation of multiple views. CDIMC-net captures the high-level features and local structure of each view by incorporating view-specific depth encoders and graph embedding strategies into one framework. The SiMVC expands the baseline model by adding a contrast learning component, enters a selective alignment process, and retains the ability of the model to prioritize viewing angles. The covc utilizes the clustering results to improve the quality of feature learning. DBSCAN is a relatively representative density-based clustering algorithm. Unlike the partitioning and hierarchical clustering method, which defines clusters as the largest set of densely connected points, it is possible to partition a region having a sufficiently high density into clusters and find clusters of arbitrary shape in a noisy spatial database. Deep dpm uses a split/fusion network to accommodate cluster number variations in a dynamic structure and proposes a loss calculation function.
As can be seen from table 1, the multi-view clustering method (MC-DPGMM) proposed by this embodiment significantly outperforms the other methods on all 2 data sets.
The multi-view clustering device provided by the invention is described below, and the multi-view clustering device described below and the multi-view clustering method described above can be correspondingly referred to each other.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a multi-view clustering device according to an embodiment of the invention. As shown in fig. 4, the apparatus may include:
the extraction module 10 is configured to obtain multi-view sample data, and input the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
the encoding module 20 is configured to input original features of the multi-view sample data into encoders corresponding to the multi-views, respectively, to obtain encoded features of the multi-view sample data;
the construction module 30 is configured to construct a depth variation inference network of the dirichlet procedure gaussian mixture model, where the depth variation inference network is configured to minimize a KL distance between a heavy parameter variation probability distribution and a priori probability distribution of the dirichlet procedure gaussian mixture model;
And the clustering module 40 is used for inputting the coding features of the sample data of the multiple views into the depth variation reasoning network, and performing unsupervised clustering of the unknown cluster numbers to obtain the cluster numbers of the cluster clusters and the prediction cluster labels of the samples in each view.
Optionally, the building module 30 includes:
a first determining unit configured to determine a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;
the second determining unit is used for determining the prior probability distribution of the dirichlet process Gaussian mixture model based on the joint distribution probability of the variation parameters and the characteristics to be classified in the second distribution under the prior super parameters of the dirichlet process Gaussian mixture model;
the first construction unit is used for constructing a depth variation inference network of the dirichlet procedure Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.
Optionally, the clustering module 40 includes:
the first input unit is used for taking the coding characteristics of the multi-view sample data as the characteristics to be classified and inputting the characteristics into the depth variation reasoning network;
The conversion unit is used for converting the first minimization function into a second minimization function of variation reasoning loss between mathematical expectations corresponding to the heavy parameter variation probability distribution and mathematical expectations corresponding to the prior probability distribution;
and the clustering unit is used for solving the second minimization function to obtain the clustering number of the clustering clusters and the prediction clustering labels of the samples in each view angle.
Optionally, the apparatus further comprises: an optimization module, the optimization module comprising:
the third determining unit is used for determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on the prediction cluster labels of all samples in all view angles;
the alignment unit is used for performing cross-view dual-anchor contrast learning based on the first accumulation function and the second accumulation function so as to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;
and the optimization unit is used for carrying out average pooling on the prediction probability of the prediction clustering label after the target sample data of all view angles in each cluster are aligned according to the target sample data in each cluster, so as to obtain the robust label of the target sample data.
Optionally, the third determining unit is specifically configured to:
determining a cross-view sample anchor point contrast loss function between a predicted cluster tag sequence of each sample in a first view and a predicted cluster tag sequence of each sample in a second view with a sample anchor point for the first view;
a first accumulation function of cross-view sample anchor point contrast loss functions for all views is determined.
Optionally, the third determining unit is further configured to:
determining a cross-view cluster anchor point contrast loss function between a predicted cluster label sequence of each sample in a first view and a predicted cluster label sequence of each sample in a second view with a cluster anchor point for the first view;
a second accumulation function of cross-view cluster anchor point contrast loss functions for all views is determined.
Optionally, the apparatus further comprises: an adjustment module, the adjustment module comprising:
the second input unit is used for inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data;
the second construction unit is used for constructing an unsupervised reconstruction loss function based on the original characteristics and the reconstruction characteristics of the multi-view sample data;
And the adjusting unit is used for adjusting the parameters of the encoder corresponding to the multiple view angles based on the unsupervised reconstruction loss function.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a multi-view clustering method comprising:
acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;
a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;
Inputting the coding features of the multi-view sample data into the depth variation reasoning network, and performing unsupervised clustering of the unknown clusters to obtain the cluster numbers of the clusters and the predictive cluster labels of the samples in each view.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the multi-perspective clustering method provided by the above methods, the method comprising:
acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;
a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;
inputting the coding features of the multi-view sample data into the depth variation reasoning network, and performing unsupervised clustering of the unknown clusters to obtain the cluster numbers of the clusters and the predictive cluster labels of the samples in each view.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the multi-perspective clustering methods provided above, the method comprising:
acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;
a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;
inputting the coding features of the multi-view sample data into the depth variation reasoning network, and performing unsupervised clustering of the unknown clusters to obtain the cluster numbers of the clusters and the predictive cluster labels of the samples in each view.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A multi-view clustering method, comprising:
acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;
respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;
a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;
inputting the coding features of the multi-view sample data into the depth variation reasoning network, and performing unsupervised clustering of the unknown clusters to obtain the cluster numbers of the clusters and the predictive cluster labels of the samples in each view.
2. The multi-view clustering method according to claim 1, wherein the constructing the depth variation inference network of the dirichlet procedure gaussian mixture model comprises:
determining a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;
Determining the prior probability distribution of the dirichlet procedure Gaussian mixture model based on the joint distribution probability of the variation parameters and the features to be classified in the second distribution under the prior super parameters of the dirichlet procedure Gaussian mixture model;
and constructing a depth variation inference network of the Dirichlet process Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.
3. The multi-view clustering method according to claim 2, wherein inputting the coding features of the multi-view sample data into the depth variation inference network, performing unsupervised clustering of unknown clusters to obtain the number of clusters and the predictive cluster labels of each sample in each view, and comprises:
taking the coding features of the multi-view sample data as the features to be classified, and inputting the features into the depth variation reasoning network;
converting the first minimization function into a second minimization function of variation reasoning loss between mathematical expectations corresponding to the heavy parameter variation probability distribution and mathematical expectations corresponding to the prior probability distribution;
and solving the second minimization function to obtain the clustering number of the clustering clusters and the prediction clustering labels of the samples in each view angle.
4. A multi-view clustering method according to any one of claims 1 to 3, further comprising:
determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on predictive cluster labels of all samples in all view angles;
based on the first accumulation function and the second accumulation function, performing cross-view dual-anchor contrast learning to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;
and aiming at the target sample data in each cluster, carrying out average pooling on the prediction probability of the prediction cluster labels after the target sample data of all view angles in the cluster are aligned, and obtaining the robust labels of the target sample data.
5. The multi-view clustering method of claim 4, wherein determining a first cumulative function of cross-view sample anchor point contrast loss functions for all views based on predictive cluster labels for samples in each view comprises:
determining a cross-view sample anchor point contrast loss function between a predicted cluster tag sequence of each sample in a first view and a predicted cluster tag sequence of each sample in a second view with a sample anchor point for the first view;
A first accumulation function of cross-view sample anchor point contrast loss functions for all views is determined.
6. The multi-view clustering method of claim 4, wherein determining a second cumulative function of cross-view cluster anchor point contrast loss functions for all views based on predictive cluster labels for samples in each view comprises:
determining a cross-view cluster anchor point contrast loss function between a predicted cluster label sequence of each sample in a first view and a predicted cluster label sequence of each sample in a second view with a cluster anchor point for the first view;
a second accumulation function of cross-view cluster anchor point contrast loss functions for all views is determined.
7. The multi-view clustering method of claim 4, further comprising:
inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data;
constructing an unsupervised reconstruction loss function based on original features and reconstruction features of the multi-view sample data;
and adjusting parameters of the encoder corresponding to the multiple views based on the unsupervised reconstruction loss function.
8. A multi-view clustering device, comprising:
the extraction module is used for acquiring multi-view sample data, inputting the multi-view sample data into the feature extraction network and obtaining original features of the multi-view sample data;
the encoding module is used for inputting original features of the multi-view sample data into encoders corresponding to the multi-view respectively to obtain encoding features of the multi-view sample data;
the construction module is used for constructing a depth variation reasoning network of the dirichlet process Gaussian mixture model, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet process Gaussian mixture model;
and the clustering module is used for inputting the coding features of the sample data of the multiple views into the depth variation reasoning network, and performing unsupervised clustering of the unknown cluster number to obtain the cluster number of the cluster and the prediction cluster labels of the samples in each view.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the multi-view clustering method of any one of claims 1 to 7 when the program is executed.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the multi-view clustering method according to any one of claims 1 to 7.
CN202310138483.0A 2023-02-14 2023-02-14 Multi-view clustering method and device, electronic equipment and storage medium Pending CN116204804A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310138483.0A CN116204804A (en) 2023-02-14 2023-02-14 Multi-view clustering method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310138483.0A CN116204804A (en) 2023-02-14 2023-02-14 Multi-view clustering method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116204804A true CN116204804A (en) 2023-06-02

Family

ID=86516875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310138483.0A Pending CN116204804A (en) 2023-02-14 2023-02-14 Multi-view clustering method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116204804A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117542057A (en) * 2024-01-09 2024-02-09 南京信息工程大学 Multi-view clustering method based on relationship among modular network modeling views

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117542057A (en) * 2024-01-09 2024-02-09 南京信息工程大学 Multi-view clustering method based on relationship among modular network modeling views
CN117542057B (en) * 2024-01-09 2024-04-05 南京信息工程大学 Multi-view clustering method based on relationship among modular network modeling views

Similar Documents

Publication Publication Date Title
Gündüz et al. Beyond transmitting bits: Context, semantics, and task-oriented communications
CN109960810B (en) Entity alignment method and device
CN111382555B (en) Data processing method, medium, device and computing equipment
US20220207352A1 (en) Methods and systems for generating recommendations for counterfactual explanations of computer alerts that are automatically detected by a machine learning algorithm
US11488283B1 (en) Point cloud reconstruction method and apparatus based on pyramid transformer, device, and medium
CN113159056A (en) Image segmentation method, device, equipment and storage medium
CN112464004A (en) Multi-view depth generation image clustering method
CN111210382B (en) Image processing method, image processing device, computer equipment and storage medium
CN109492610B (en) Pedestrian re-identification method and device and readable storage medium
CN113821668A (en) Data classification identification method, device, equipment and readable storage medium
CN116204804A (en) Multi-view clustering method and device, electronic equipment and storage medium
CN113128600A (en) Structured depth incomplete multi-view clustering method
Tian et al. Multi-scale hierarchical residual network for dense captioning
CN114925767A (en) Scene generation method and device based on variational self-encoder
DE102022131824A1 (en) Visual speech recognition for digital videos using generative-adversative learning
US20220207353A1 (en) Methods and systems for generating recommendations for counterfactual explanations of computer alerts that are automatically detected by a machine learning algorithm
CN113936243A (en) Discrete representation video behavior identification system and method
EP3166022A1 (en) Method and apparatus for image search using sparsifying analysis operators
Zou et al. 360$^{\circ} $ Image Saliency Prediction by Embedding Self-Supervised Proxy Task
CN116958613A (en) Depth multi-view clustering method and device, electronic equipment and readable storage medium
CN116975347A (en) Image generation model training method and related device
CN117333409A (en) Big data analysis method based on image
CN115640418B (en) Cross-domain multi-view target website retrieval method and device based on residual semantic consistency
CN114882288B (en) Multi-view image classification method based on hierarchical image enhancement stacking self-encoder
CN116977714A (en) Image classification method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination