CN116204804A

CN116204804A - Multi-view clustering method and device, electronic equipment and storage medium

Info

Publication number: CN116204804A
Application number: CN202310138483.0A
Authority: CN
Inventors: 钱胜胜; 徐常胜; 薛迪展
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-06-02

Abstract

The invention provides a multi-view clustering method, a device, electronic equipment and a storage medium, and relates to the technical field of multimedia, wherein the method comprises the following steps: acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data; respectively inputting the data into encoders corresponding to multiple views to obtain the coding characteristics of sample data of the multiple views; constructing a depth variation reasoning network of the dirichlet procedure Gaussian mixture model, wherein the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model; inputting the coding features of the sample data of the multiple views into a depth variation reasoning network, and performing unsupervised clustering of the unknown cluster numbers to obtain the cluster numbers of the cluster clusters and the prediction cluster labels of the samples in each view. The invention can perform multi-view clustering under the condition of unknown clustering number, and automatically find the clustering number in the clustering process.

Description

Multi-view clustering method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a multi-view clustering method, a device, an electronic apparatus, and a storage medium.

Background

With the rapid development of the internet and multimedia, multi-view data has been explosively increased. The multi-view clustering is used as a basic task of multi-view learning, and aims to mine information complementary characteristics among different views so as to improve data clustering performance.

In the existing multi-view clustering scheme, the depth-dependent clustering number is priori information to improve the multi-view clustering effect. However, in a scene with more realistic significance, the clustering number of multi-view data to be clustered is often not known, so how to perform multi-view clustering under the condition of unknown clustering number and automatically discover the clustering number in the clustering process becomes a great challenge.

Disclosure of Invention

The invention provides a multi-view clustering method, a device, electronic equipment and a storage medium, which are used for solving the defect that multi-view clustering cannot be carried out under the condition of unknown clustering number and the clustering number is automatically found in the clustering process in the prior art, and realizing multi-view clustering under the condition of unknown clustering number and the clustering number is automatically found in the clustering process.

The invention provides a multi-view clustering method, which comprises the following steps:

acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;

Respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;

a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;

inputting the coding features of the multi-view sample data into the depth variation reasoning network, and performing unsupervised clustering of the unknown clusters to obtain the cluster numbers of the clusters and the predictive cluster labels of the samples in each view.

According to the multi-view clustering method provided by the invention, the depth variation reasoning network for constructing the dirichlet process Gaussian mixture model comprises the following steps:

determining a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;

determining the prior probability distribution of the dirichlet procedure Gaussian mixture model based on the joint distribution probability of the variation parameters and the features to be classified in the second distribution under the prior super parameters of the dirichlet procedure Gaussian mixture model;

And constructing a depth variation inference network of the Dirichlet process Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.

According to the multi-view clustering method provided by the invention, the coding features of the sample data of the multi-view are input into the depth variation inference network, and unsupervised clustering of unknown clusters is performed to obtain the number of clusters of the clusters and the predictive cluster labels of the samples in each view, and the method comprises the following steps:

taking the coding features of the multi-view sample data as the features to be classified, and inputting the features into the depth variation reasoning network;

converting the first minimization function into a second minimization function of variation reasoning loss between mathematical expectations corresponding to the heavy parameter variation probability distribution and mathematical expectations corresponding to the prior probability distribution;

and solving the second minimization function to obtain the clustering number of the clustering clusters and the prediction clustering labels of the samples in each view angle.

According to the multi-view clustering method provided by the invention, the method further comprises the following steps:

determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on predictive cluster labels of all samples in all view angles;

Based on the first accumulation function and the second accumulation function, performing cross-view dual-anchor contrast learning to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;

and aiming at the target sample data in each cluster, carrying out average pooling on the prediction probability of the prediction cluster labels after the target sample data of all view angles in the cluster are aligned, and obtaining the robust labels of the target sample data.

According to the multi-view clustering method provided by the invention, the first accumulation function of the cross-view sample anchor point contrast loss function of all view angles is determined based on the predictive clustering labels of all samples in all view angles, and the method comprises the following steps:

determining a cross-view sample anchor point contrast loss function between a predicted cluster tag sequence of each sample in a first view and a predicted cluster tag sequence of each sample in a second view with a sample anchor point for the first view;

a first accumulation function of cross-view sample anchor point contrast loss functions for all views is determined.

According to the multi-view clustering method provided by the invention, the second accumulation function of the cross-view cluster anchor point contrast loss function of all view angles is determined based on the prediction clustering labels of all samples in all view angles, and the method comprises the following steps:

Determining a cross-view cluster anchor point contrast loss function between a predicted cluster label sequence of each sample in a first view and a predicted cluster label sequence of each sample in a second view with a cluster anchor point for the first view;

a second accumulation function of cross-view cluster anchor point contrast loss functions for all views is determined.

inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data;

constructing an unsupervised reconstruction loss function based on original features and reconstruction features of the multi-view sample data;

and adjusting parameters of the encoder corresponding to the multiple views based on the unsupervised reconstruction loss function.

The invention also provides a multi-view clustering device, which comprises:

the extraction module is used for acquiring multi-view sample data, inputting the multi-view sample data into the feature extraction network and obtaining original features of the multi-view sample data;

the encoding module is used for inputting original features of the multi-view sample data into encoders corresponding to the multi-view respectively to obtain encoding features of the multi-view sample data;

The construction module is used for constructing a depth variation reasoning network of the dirichlet process Gaussian mixture model, and the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet process Gaussian mixture model;

and the clustering module is used for inputting the coding features of the sample data of the multiple views into the depth variation reasoning network, and performing unsupervised clustering of the unknown cluster number to obtain the cluster number of the cluster and the prediction cluster labels of the samples in each view.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the multi-view clustering method as described in any one of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the multi-view clustering method as described in any one of the above.

According to the multi-view clustering method, the device, the electronic equipment and the storage medium, firstly, multi-view sample data are obtained, and the multi-view sample data are input into a feature extraction network to obtain original features of the multi-view sample data; then, respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-view to obtain coding features of the multi-view sample data, and mapping the original features of the multi-view sample data into a common semantic representation space; then, a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, wherein the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model; finally, inputting the coding features of the sample data of the multiple views into a depth variation reasoning network, and performing unsupervised clustering of unknown clusters by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Gaussian mixture model in the Dirichlet process to obtain the cluster number of the clusters and the prediction cluster labels of all samples in each view. Therefore, the invention can perform multi-view clustering under the condition of unknown clustering number, and automatically find the clustering number in the clustering process.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a multi-view clustering method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a model framework of a multi-view clustering method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of cross-view dual anchor contrast learning provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a multi-view clustering device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The multi-view clustering method of the present invention is described below with reference to fig. 1 to 3.

Referring to fig. 1, fig. 1 is a flow chart of a multi-view clustering method according to an embodiment of the invention. As shown in fig. 1, the method may include the steps of:

step 101, acquiring multi-view sample data, and inputting the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;

102, respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-views to obtain coding features of the multi-view sample data;

step 103, constructing a depth variation inference network of the dirichlet procedure Gaussian mixture model, wherein the depth variation inference network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model;

and 104, inputting the coding features of the sample data of the multiple views into the depth variation inference network, and performing unsupervised clustering of the unknown cluster numbers to obtain the cluster numbers of the cluster clusters and the prediction cluster labels of the samples in each view.

In step 101, as shown in fig. 2, the multi-view may include: text, images, audio, etc. The sample data of the multiple views is sample data of a plurality of samples at different views, and if the samples are birds, the sample data of the birds at different views includes: text about a descriptive bird, photographed images of a bird, recorded bird call audio, etc.

The feature extraction network may be: a pre-trained deep learning network for feature extraction. Illustratively, the number of samples from multiple views will beAccording to the feature space mapped to different view angles by the feature extraction function in the pre-trained deep learning network, the original features of the sample data of multiple view angles are obtained

m=1, …, M represents the number of viewing angles. Wherein the original feature of the sample data of the mth view angle +.>

N represents the number of samples.

In step 102, since the original features of the multi-view sample data contain redundancy and random noise, an encoder may be employed to project the original features of the multi-view sample data into a common space.

Illustratively, for the mth view, an encoder corresponding to the mth view is constructed

The encoder includes a multi-layer perceptron. Original feature X of sample data of mth view angle ^m Input encoder->

In which the encoder uses X ^m And projecting the sample data into the D-dimensional feature space to obtain the coding feature of the sample data of the mth view. Optionally, the encoder is an automatic encoder.

Specifically, the original feature of the nth sample at the mth viewing angle can be expressed by expression (1)

Projecting the coded features to a D-dimensional feature space to obtain the coded features of the nth sample at the mth view:

Wherein, the liquid crystal display device comprises a liquid crystal display device,

coding features representing the nth sample at the mth view,/->

Trainable parameters representing the encoder for the mth view, for example>

In step 103, in order to perform unsupervised clustering on coding features across view angles and overcome the difficulty that the prior distribution cannot be directly calculated by the dirichlet process Gaussian mixture model, a depth variation inference network of the dirichlet process Gaussian mixture model is constructed. The principle of the depth variation reasoning network is as follows: unsupervised clustering is performed by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Dirichlet process Gaussian mixture model.

Optionally, step 103 comprises the sub-steps of:

step 1031, determining a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;

step 1032, determining a priori probability distribution of the dirichlet procedure gaussian mixture model based on the joint distribution probability of the variation parameter and the feature to be classified in the second distribution under the priori super parameter of the dirichlet procedure gaussian mixture model;

and 1033, constructing a depth variation inference network of the dirichlet procedure Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.

In step 1031, q _γ (w) represents the distribution probability of w in the distribution q, i.e., the heavy parameter variation probability distribution. Wherein γ represents the variation parameter, the distribution q represents the first distribution, w represents the weight parameter, w= { v, η ^* Z, v represents a class prior parameter, η ^* The gaussian mixture parameters are represented, and z represents the cluster label.

In step 1032, p (w, h|θ) represents that w and h are at θThe joint distribution probability in the distribution p is the prior probability distribution of the dirichlet process gaussian mixture model. Wherein h= { h ₁ ,…,h _N And (2) representing the sortable features of the N samples, wherein θ represents the prior super-parameters of the Gaussian mixture model of the Dirichlet process.

In step 1033, q is calculated by expression (2) _γ KL distance between (w) and p (w, h|θ):

KL(q _γ (w)||p(w|h,θ)＝E _q [log q _γ (w)]-E _q [logp(w,h|θ)]+logp(h|θ) (2)

wherein KL () represents a KL distance function, E _q Representing mathematical expectations to satisfy q distribution, q _γ The subscript is abbreviated as q, p (h|θ) represents the probability of a joint distribution of h in the distribution p at θ, and the bottom of the log function is e or 2.

Based on KL (q) _γ A first minimization function of (w) iip (w|h, θ), a depth variation inference network of the dirichlet procedure gaussian mixture model can be constructed.

In step 104, the coding features of the sample data of multiple views are input into a depth variation inference network, and unsupervised clustering of unknown clusters can be performed by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Gaussian mixture model in the Dirichlet process, so as to obtain the clusters of the clusters and the predictive cluster labels of the samples in each view.

Optionally, step 104 comprises the sub-steps of:

step 1041, taking the coding features of the multi-view sample data as the features to be classified, and inputting the features into the depth variation reasoning network;

step 1042, converting the first minimization function into a second minimization function of the variation inference loss between the mathematical expectation corresponding to the heavy parameter variation probability distribution and the mathematical expectation corresponding to the prior probability distribution;

step 1043, solving the second minimization function to obtain the clustering number of the clustering clusters and the predictive clustering labels of the samples in each view angle.

In step 1041, the coding features of the multi-view sample data are substituted into the depth variation inference network as the features to be classified.

In step 1042, since h is a fixed feature, lovp (h|θ) is a constant, minimizing the KL distance is equivalent to minimizing the loss. Based on this, the first minimization function is converted to q _γ (w) a second minimization function of the variable reasoning penalty between the mathematical expectations corresponding to p (w, h|θ).

Determining a variational inference loss function by expression (3):

wherein L is _var Representing the variational reasoning loss, θ= { α, μ ₀ C, a, b }, p (v|α) represents Beta (1, α) of the Beta distribution, α represents the Beta distribution parameter, p (η) ^* |μ ₀ C, a, b) represents canonical gamma distribution normgamma (μ) ₀ ,c,a,b)，μ ₀ Representing Gaussian prior central parameters, c representing Gaussian variance convergence, a and b representing gamma distribution parameters, gaussian mixture parameters

T represents the number of clusters, mu _T Representing the Gaussian center parameter, Σ _T Representing the Gaussian covariance parameter, p (z) _n I v) represents the polynomial distribution Mult (pi (v)),/v +>

π _i The ith parameter, v, representing a polynomial distribution _i Class prior parameters representing the ith cluster, p (h _n |z _n ,η ^* ) Representing Gaussian distribution->

Indicating z < th _n A Gaussian center parameter, ">

Indicating z < th _n Gao Sixie variance parameter, z _n A cluster label representing the nth sample.

The minimization function of the variational reasoning loss function is the second minimization function.

In step 1043, the dirichlet procedure and T-piece multivariate gaussian distribution are parameterized using truncation techniques, where the a priori distribution parameters v= { v ₁ ,…,v _T-1 Gaussian mixture parameters

Therefore, the learnable parameters of the deep heavy parameter variation inference network of the dirichlet process gaussian mixture model can be gamma= { v ₁ ,…,v _T-1 ,μ ₁ ,…,μ _T ,Σ ₁ ,…,Σ _T }。

Solving a second minimization function to obtain the clustering number T of the clustering cluster (namely the multi-element Gaussian distribution) and the predictive clustering labels of the samples in each view angle

According to the multi-view clustering method provided by the embodiment, firstly, multi-view sample data are obtained, and the multi-view sample data are input into a feature extraction network to obtain original features of the multi-view sample data; then, respectively inputting original features of the multi-view sample data into encoders corresponding to the multi-view to obtain coding features of the multi-view sample data, and mapping the original features of the multi-view sample data into a common semantic representation space; then, a depth variation reasoning network of the dirichlet procedure Gaussian mixture model is constructed, wherein the depth variation reasoning network is used for minimizing the KL distance between the heavy parameter variation probability distribution and the priori probability distribution of the dirichlet procedure Gaussian mixture model; finally, inputting the coding features of the sample data of the multiple views into a depth variation reasoning network, and performing unsupervised clustering of unknown clusters by minimizing the KL distance between the heavy parameter variation probability distribution and the prior probability distribution of the Gaussian mixture model in the Dirichlet process to obtain the cluster number of the clusters and the prediction cluster labels of all samples in each view. Therefore, the embodiment can perform multi-view clustering under the condition of unknown clusters, and automatically discover the clusters in the clustering process.

In an embodiment, the method further comprises:

step 105, determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on predictive cluster labels of samples in all view angles;

step 106, based on the first accumulation function and the second accumulation function, performing cross-view dual-anchor contrast learning to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;

and 107, carrying out average pooling on the prediction clustering labels after the target sample data of all view angles in each cluster are aligned according to the target sample data in each cluster, and obtaining the robust labels of the target sample data.

In step 105, as shown in FIG. 3, H is defined ^m Predictive cluster labels of (2) are

Wherein H is ^m Coding features of sample data representing the mth view,/->

Predictive cluster label representing nth sample at mth view angle, +.>

A cluster label representing the nth sample mth view angle.

1) With respect to the first accumulation function (sample anchor point)

Determining a cross-view sample anchor point contrast loss function between a predicted cluster tag sequence of each sample in a first view and a predicted cluster tag sequence of each sample in a second view with a sample anchor point for the first view; a first accumulation function of cross-view sample anchor point contrast loss functions for all views is determined.

Illustratively, since predictive cluster labels of the same sample data should be consistent between different views, the mth view can be determined by expression (4)

And the first +.f. of the view with sample anchor>

Cross-view sample anchor point contrast loss function between: />

representation->

And->

Cross-view sample anchor contrast loss function between +.>

Predictive cluster labels representing the nth sample at the first view, cos () representing cosine similarity, τ _s Representing the temperature coefficient.

To align labels between all views, a first cumulative function of cross-view sample anchor contrast loss functions for all views can be determined by expression (5) to calculate contrast loss in all views:

a first accumulation function representing a cross-view sample anchor point versus loss function for all views.

2) With respect to the second accumulation function (cluster anchor point)

Determining a cross-view cluster anchor point contrast loss function between a predicted cluster label sequence of each sample in a first view and a predicted cluster label sequence of each sample in a second view with a cluster anchor point for the first view; a second accumulation function of cross-view cluster anchor point contrast loss functions for all views is determined.

Will be

Is denoted by (i)>

Wherein (1)>

Predictive cluster labels representing the i-th cluster of all N samples. Since the predictive cluster labels of the same cluster should also be identical between different views, the +.m. of the mth view can be determined by expression (6)>

And the first view with cluster anchor +.>

Is used for comparing loss functions with cross-view cluster anchor points:

representation->

And->

Cross-view cluster anchor point contrast loss function between tau _c Representing the temperature coefficient.

To align labels between all views, a second cumulative function of cross-view cluster anchor contrast loss functions for all views can be determined by expression (7) to calculate contrast loss in all views:

a second accumulation function representing a cross view cluster anchor point versus loss function for all views.

In step 106, each view should contain common semantics, although sample data for different views may contain noise for a particular view. Therefore in

And performing cross-view dual-anchor contrast learning, and aligning the predictive cluster labels of the same sample data and the predictive cluster labels of the same cluster.

By optimizing

And->

Predictive cluster labels of the same sample data and predictive cluster labels of the same cluster can be aligned across view angles to construct a common semantic structure and eliminate Except for view specific noise in the tag space.

After aligning the predictive cluster labels in different perspectives, the predictive cluster labels may be averaged and pooled by expression (8) to obtain a robust label for the nth sample in step 107:

wherein y is _n A robust tag representing the nth sample,

representing the predictive cluster labels of the nth sample in the ith cluster at the mth view angle.

In this embodiment, an end-to-end bayesian depth network (MC-DPGMM) combining cross-view contrast learning is constructed, which can efficiently cluster multi-view data under the condition of unknown clusters. In MC-DPGMM, common features of multiple views are learned by adopting a feature extraction network based on an encoder, and the common features are combined with cross-view dual-anchor contrast learning, so that predictive cluster labels of the same sample data and predictive cluster labels of the same cluster can be aligned across views, and heterogeneity of multi-view data can be overcome.

In an embodiment, the method further comprises: inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data; constructing an unsupervised reconstruction loss function based on original features and reconstruction features of the multi-view sample data; and adjusting parameters of the encoder corresponding to the multiple views based on the unsupervised reconstruction loss function.

In particular, the decoder D is constructed for the mth view ^m (·；φ ^m ) The decoder includes a multi-layer perceptron. By the expression (9) pair

And (4) reconstructing:

wherein phi is ^m Representing the trainable parameters of the mth decoder,

representing the reconstruction characteristics of the nth sample at the mth view angle,>

to preserve semantics in the input and avoid model collapse, the parameters of the encoder for multi-view correspondence may be adjusted using an unsupervised reconstruction loss function of the encoder:

wherein L is _rec Represents an unsupervised reconstruction loss function, M represents the number of view angles, N represents the number of samples,

representing the original features of the nth sample at the mth view angle,/for the mth sample>

Representing the reconstructed characteristics of the nth sample at the mth view angle.

Based on learned characteristics

m=1, …, M, multi-view clustering is performed with unknown clusters, and common semantics are mined in all views to improve cluster quality.

In this embodiment, an unsupervised reconstruction penalty is utilized to avoid model crashes.

The following experiments prove the effect of the multi-view clustering method:

in order to evaluate the multi-view clustering method provided by the embodiment, a Caltech-2V multi-view dataset and a Caltech-3V multi-view dataset are adopted, wherein the Caltech-2V multi-view dataset comprises two views, and 1400 samples and 7 categories of each view are adopted; the Caltech-3V multiview dataset contains three views, 1400 samples each, 7 categories each.

Table 1 shows a comparison of the multi-view clustering method (MC-DPGMM) proposed in this example with other methods of the prior art, over the Caltech-2V multi-view dataset and the Caltech-3V multi-view dataset. Three evaluation indices may be used: ACC, NMI, ARI.

Table 1 comparison results of different multi-view clustering methods

Wherein RMSL is an interactive multi-layer subspace learning algorithm based on multi-view clustering, constructing a multi-layer subspace interactive representation linked with a potential representation layer H to hierarchically recover the data potential cluster structure. MVC-LFA proposes a multi-view feature clustering algorithm based on post fusion alignment maximization. COMIC improves clustering performance by maximizing the use of a pre-computed set of complementary perspectives. The IMVTSC-MVI combines feature space based missing view inference and manifold space based similarity graph learning into a unified framework, introducing low rank tensor constraints to capture the higher order correlation of multiple views. CDIMC-net captures the high-level features and local structure of each view by incorporating view-specific depth encoders and graph embedding strategies into one framework. The SiMVC expands the baseline model by adding a contrast learning component, enters a selective alignment process, and retains the ability of the model to prioritize viewing angles. The covc utilizes the clustering results to improve the quality of feature learning. DBSCAN is a relatively representative density-based clustering algorithm. Unlike the partitioning and hierarchical clustering method, which defines clusters as the largest set of densely connected points, it is possible to partition a region having a sufficiently high density into clusters and find clusters of arbitrary shape in a noisy spatial database. Deep dpm uses a split/fusion network to accommodate cluster number variations in a dynamic structure and proposes a loss calculation function.

As can be seen from table 1, the multi-view clustering method (MC-DPGMM) proposed by this embodiment significantly outperforms the other methods on all 2 data sets.

The multi-view clustering device provided by the invention is described below, and the multi-view clustering device described below and the multi-view clustering method described above can be correspondingly referred to each other.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a multi-view clustering device according to an embodiment of the invention. As shown in fig. 4, the apparatus may include:

the extraction module 10 is configured to obtain multi-view sample data, and input the multi-view sample data into a feature extraction network to obtain original features of the multi-view sample data;

the encoding module 20 is configured to input original features of the multi-view sample data into encoders corresponding to the multi-views, respectively, to obtain encoded features of the multi-view sample data;

the construction module 30 is configured to construct a depth variation inference network of the dirichlet procedure gaussian mixture model, where the depth variation inference network is configured to minimize a KL distance between a heavy parameter variation probability distribution and a priori probability distribution of the dirichlet procedure gaussian mixture model;

And the clustering module 40 is used for inputting the coding features of the sample data of the multiple views into the depth variation reasoning network, and performing unsupervised clustering of the unknown cluster numbers to obtain the cluster numbers of the cluster clusters and the prediction cluster labels of the samples in each view.

Optionally, the building module 30 includes:

a first determining unit configured to determine a heavy parameter variation probability distribution based on the variation probability distribution of the heavy parameter in the first distribution;

the second determining unit is used for determining the prior probability distribution of the dirichlet process Gaussian mixture model based on the joint distribution probability of the variation parameters and the characteristics to be classified in the second distribution under the prior super parameters of the dirichlet process Gaussian mixture model;

the first construction unit is used for constructing a depth variation inference network of the dirichlet procedure Gaussian mixture model based on a first minimization function of the KL distance between the heavy parameter variation probability distribution and the prior probability distribution.

Optionally, the clustering module 40 includes:

the first input unit is used for taking the coding characteristics of the multi-view sample data as the characteristics to be classified and inputting the characteristics into the depth variation reasoning network;

The conversion unit is used for converting the first minimization function into a second minimization function of variation reasoning loss between mathematical expectations corresponding to the heavy parameter variation probability distribution and mathematical expectations corresponding to the prior probability distribution;

and the clustering unit is used for solving the second minimization function to obtain the clustering number of the clustering clusters and the prediction clustering labels of the samples in each view angle.

Optionally, the apparatus further comprises: an optimization module, the optimization module comprising:

the third determining unit is used for determining a first accumulation function of cross-view sample anchor point contrast loss functions of all view angles and a second accumulation function of cross-view cluster anchor point contrast loss functions of all view angles based on the prediction cluster labels of all samples in all view angles;

the alignment unit is used for performing cross-view dual-anchor contrast learning based on the first accumulation function and the second accumulation function so as to align predictive cluster labels of the same sample data and predictive cluster labels of the same cluster across view angles;

and the optimization unit is used for carrying out average pooling on the prediction probability of the prediction clustering label after the target sample data of all view angles in each cluster are aligned according to the target sample data in each cluster, so as to obtain the robust label of the target sample data.

Optionally, the third determining unit is specifically configured to:

Optionally, the third determining unit is further configured to:

Optionally, the apparatus further comprises: an adjustment module, the adjustment module comprising:

the second input unit is used for inputting the robust tag of the multi-view sample data into a decoder corresponding to the multi-view to obtain the reconstruction characteristics of the multi-view sample data;

the second construction unit is used for constructing an unsupervised reconstruction loss function based on the original characteristics and the reconstruction characteristics of the multi-view sample data;

And the adjusting unit is used for adjusting the parameters of the encoder corresponding to the multiple view angles based on the unsupervised reconstruction loss function.

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a multi-view clustering method comprising:

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the multi-perspective clustering method provided by the above methods, the method comprising:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the multi-perspective clustering methods provided above, the method comprising:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-view clustering method, comprising:

2. The multi-view clustering method according to claim 1, wherein the constructing the depth variation inference network of the dirichlet procedure gaussian mixture model comprises:

3. The multi-view clustering method according to claim 2, wherein inputting the coding features of the multi-view sample data into the depth variation inference network, performing unsupervised clustering of unknown clusters to obtain the number of clusters and the predictive cluster labels of each sample in each view, and comprises:

4. A multi-view clustering method according to any one of claims 1 to 3, further comprising:

5. The multi-view clustering method of claim 4, wherein determining a first cumulative function of cross-view sample anchor point contrast loss functions for all views based on predictive cluster labels for samples in each view comprises:

6. The multi-view clustering method of claim 4, wherein determining a second cumulative function of cross-view cluster anchor point contrast loss functions for all views based on predictive cluster labels for samples in each view comprises:

7. The multi-view clustering method of claim 4, further comprising:

8. A multi-view clustering device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the multi-view clustering method of any one of claims 1 to 7 when the program is executed.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the multi-view clustering method according to any one of claims 1 to 7.