CN117009838B

CN117009838B - Multi-scale fusion contrast learning multi-view clustering method and system

Info

Publication number: CN117009838B
Application number: CN202311253624.XA
Authority: CN
Inventors: 易玉根; 黄龙军; 雷刚; 王建中; 张宁毅; 张泽辉
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-01-26
Anticipated expiration: 2043-09-27
Also published as: CN117009838A

Abstract

The invention provides a multi-scale fusion contrast learning multi-view clustering method and a system, wherein the method comprises the following steps: according to a preset multi-view data set, acquiring specific representations of all views to acquire feature representations of all views through a preset shared multi-layer sensing network, acquiring common feature representations of the feature representations and generating anchor views through a preset fusion network to perform feature representation contrast learning, and acquiring cluster representations through a preset shared projection network to perform cluster representation contrast learning and acquire cluster results. According to the multi-scale fusion contrast learning multi-view clustering method, the shared representation information is captured and combined through the integrated shared multi-layer sensing network and the fusion network, the problem of influence of view private information on clustering is solved, the comparison times are reduced by adopting multi-scale contrast learning, the problem of long model training time is solved, the consistency of visual information on different scales is ensured, and the clustering performance and efficiency of the model are greatly improved.

Description

Multi-scale fusion contrast learning multi-view clustering method and system

Technical Field

The invention relates to the field of machine learning, in particular to a multi-scale fusion contrast learning multi-view clustering method and system.

Background

With the rapid development of the machine learning field, a multi-view clustering method has been widely used in aspects of social network analysis, image processing, natural language processing and the like, and unlike the conventional single-view clustering method, the multi-view clustering method utilizes information from multiple views to deeply understand and more comprehensively describe data, thereby obtaining more accurate and robust clustering performance.

In the prior art, classical multi-view clustering methods can be divided into three categories: based on subspace clustering, matrix decomposition clustering and graph clustering, although the development of multi-view clustering is promoted, the three methods still have some problems to be solved, such as difficulty in acquiring high-level semantics from multiple views and neglecting adverse effects caused by private information, so that a multi-view clustering method based on contrast learning is widely focused by researchers.

However, the existing multi-view clustering method based on contrast learning still has a certain limitation, the existing method only uses contrast learning at a single level to ensure the consistency of clustering, but does not consider the influence of consistency of feature representation on the clustering performance, and meanwhile, most methods also show weaker robustness when facing noise data, so that the clustering performance is poor, and the contrast strategies adopted by the methods are based on pairwise contrast between different views, so that a large number of comparisons between different views are generated, and the training time of a model is greatly increased by the number of times of comparison between the excessive views.

Disclosure of Invention

Based on the above, the invention aims to provide a multi-scale fusion contrast learning multi-view clustering method and system, which are used for effectively capturing and combining shared representation information through integrating a shared multi-layer sensing network and a fusion network, thereby relieving the influence of view private information on clustering, adopting a multi-scale contrast learning strategy to reduce the comparison times, solving the problem of long model training time, ensuring the consistency of visual information on different scales, and greatly improving the clustering performance and efficiency of models.

The multi-scale fusion contrast learning multi-view clustering method provided by the invention comprises the following steps:

according to a preset multiview datasetWhereinVIndicating the number of views of multi-view data, i.e. indicating dataIncludedVObtaining the preset multiview dataset +.>Specific representation of all views in (b)To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>；

Through a preset fusion networkAcquiring the characteristic representation->To generate an anchor view from said common feature representation>To perform feature representation contrast learning;

sharing a projection network by provisioningAcquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultsc。

In summary, according to the multi-scale fusion contrast learning multi-view clustering method and system, the shared representation information is effectively captured and combined through the integrated shared multi-layer sensing network and the fusion network, so that the influence of view private information on clustering is relieved, the comparison times are reduced by adopting a multi-scale contrast learning strategy, the problem of long model training time is solved, meanwhile, the consistency of the visual information on different scales is ensured, and the clustering performance and efficiency of the model are greatly improved. Specifically, according to a preset multi-view datasetAcquiring the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>Helping to eliminate private information in each view, providing a higher quality feature representation for a subsequent multi-view converged network by presetting the converged network +.>Acquiring the characteristic representation->To generate an anchor view from said common feature representation>The feature representation contrast learning is carried out, the influence of private information on the multi-view clustering effect is further avoided, and the shared projection network is preset to be +.>Acquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultscThe comparison times are reduced through a multi-scale comparison learning strategy, the problem of long model training time is solved, the comparison learning efficiency is greatly improved, semantic tags of each view are obtained through characteristic representation of each view, the view specific private information is filtered, the problem of influence of the view private information on clustering is solved, and the clustering performance and efficiency of the model are greatly improved.

Further, the method comprises the steps of according to a preset multi-view datasetWhereinVIndicating the number of views of the multiview data, i.e. indicating that the data comprisesVObtaining the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring feature representations of all viewsThe method specifically comprises the following steps:

inputting the preset multi-view datasetThe preset multi-view datasetIncludedVA plurality of views, each view comprisingcA cluster;

encoder with a plurality of sensorsAnd said at least one ofVOne-to-one correspondence of individual views to enable view-specific encodersAccording to the formula

Extracting the first of the viewsmThe specific representation of the individual viewsWherein->Representing the first of the viewsmA view;

in acquiring the particular representation of all the viewsThen presetting shared multilayer perception networkCollaterals->According to the formula

Extracting feature representations of all views。

Further, the method comprises the steps of according to a preset multi-view datasetWhereinVIndicating the number of views of the multiview data, i.e. indicating that the data comprisesVObtaining the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring feature representations of all viewsThe steps of (a) further comprise:

decoderIs +_with the encoder>One-to-one correspondence to make view specific decoder +.>According to the formula

Calculating and obtaining reconstruction lossWhereiniRepresent the firstiSample number->Represent the firstmReconstruction error of individual view data, +.>And->Respectively represent the firstmIndividual view encoder and decoder, < >>Represent the firstmThe first view angle dataiA number of samples of the sample were taken,Nrepresenting the number of samples contained in each view angle data,Vindicating the number of views of the multiview data, i.e. indicating that the data comprisesVAnd at least one view is included in the sample at a viewing angle.

Further, the method comprises the step of presetting a fusion networkAcquiring the characteristic representation->To generate an anchor view from said common feature representation>The step of performing feature representation contrast learning includes:

acquiring feature representations of all viewsAfter that, the preset converged network +.>According to the formula

Acquiring the common feature representation and generating the anchor view from the common feature representation。

Further, the method comprises the step of presetting a fusion networkAcquiring the characteristic representation->To generate an anchor view from said common feature representation>The step of performing feature representation contrast learning further comprises:

in all the featuresIs a view of (d) and the anchor view->Positive samples and negative samples are constructed in the view and between views, and samples with the same category labels in the positive samples and the negative samples are used as positive sample pairs, and samples without the same category labels are used as negative sample pairs;

performing feature representation contrast learning according to the positive sample pair and the negative sample pair, and according to a formula

Computing contrast learning loss of feature representationWherein->Features between the mth view and the anchor view represent the overall loss of contrast learning.

Further, the shared projection network is presetAcquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultscThe method comprises the following steps:

in acquiring the feature representations of all the viewsAfter that, the preset shared projection network +.>According to the formula

Acquiring cluster representations of all views；

The preset shared projection networkAnd according to the formula

Acquiring the anchor viewCluster representation +.>；

Cluster representation from all viewsAnd Anchor view->Cluster representation +.>Cluster representation contrast learning is performed to obtain a view including anchor +.>Cluster representation +.>Cluster results of (a)c。

Further, the cluster representation according to all viewsAnd Anchor view->Cluster representation +.>Cluster representation contrast learning is performed to obtain a view including anchor +.>Cluster representation +.>Cluster results of (a)cThe method comprises the following steps:

acquiring cluster representations of all viewsCluster tag of->And the anchor view->Cluster representation +.>Cluster tag of->；

At the cluster tagAnd Cluster tag->The construction is%V-1) facing sumV(c-1) negative pairs to perform cluster representation contrast learning;

according to the formula

Computing cluster representation contrast learning lossWherein->Representation cluster representation->And->Contrast learning loss between->The whole acts as a regularization term that avoids all samples being assigned to a single cluster.

According to an embodiment of the invention, a multi-scale fusion contrast learning multi-view clustering system comprises:

a shared multi-layer perception network module for presetting multi-view data setAcquiring the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>；

The fusion network module is used for presetting a fusion networkAcquiring the characteristic representation->To generate an anchor view from said common feature representation>To perform feature representation contrast learning;

a shared projection network module for presetting a shared projection networkCollateralsAcquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultsc。

In another aspect of the present invention, a storage medium is provided, including storing one or more programs that when executed implement the multi-scale fusion contrast learning multi-view clustering method as described above.

Another aspect of the invention also provides a computer device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is used for realizing the multi-scale fusion contrast learning multi-view clustering method when executing the computer program stored on the memory.

Drawings

FIG. 1 is a flowchart of a multi-scale fusion contrast learning multi-view clustering method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a multi-scale fusion contrast learning multi-view clustering method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a multi-scale fusion contrast learning multi-view clustering system according to a third embodiment of the present invention.

The invention will be further described in the following detailed description in conjunction with the above-described figures.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1, a flowchart of a multi-scale fusion contrast learning multi-view clustering method according to a first embodiment of the present invention is shown, and the multi-scale fusion contrast learning multi-view clustering method includes steps S01 to S03, wherein:

step S01: acquiring specific representations of all views in the preset multi-view data set according to the preset multi-view data set so as to acquire characteristic representations of all views through a preset shared multi-layer sensing network;

needs to be as followsIllustratively, the preset multiview dataset of the present method comprisesVView samples of each view angle, the view samples of each view angle comprisingcThe method allows using different network architectures as encoders to project original features into a specific feature space, including multi-layer perceptron (MLP), graph Convolution (GCN) and Convolutional Neural Network (CNN), in order to reduce the complexity of the model and improve training efficiency, in this embodiment, constructing a preset shared multi-layer perceptron network uses a four-layer multi-layer perceptron (MLP) as an encoder, and definesTo represent the firstmReconstruction feature matrix of individual views, wherein +.>Is the input dimension, sharing the MLP network helps to eliminate private information in each view, provides a higher quality representation of features for subsequent multi-view fusion networks, for the thmThe shared MLP network is represented as:

wherein,and->Is the dimension of the feature representation.

Step S02: acquiring public feature representations of the feature representations through a preset fusion network, and generating an anchor view according to the public feature representations so as to perform feature representation comparison learning;

in order to further avoid the influence of private information on the multi-view clustering effect, in this embodiment, a fusion network based on information sharing is set to obtain public feature representations of all views, in this embodiment, the preset fusion network is constructed by relying on an information sharing method and is composed of fully connected MLPs with RELU activation functions, and the fused features are regularized to accelerate the convergence speed of the network.

Step S03: acquiring cluster representations of all views and cluster representations of the anchor views through a preset shared projection network so as to perform cluster representation comparison learning, and acquiring a cluster result according to the cluster representations of all views and the cluster representations of the anchor views;

it should be noted that the goal of the projection is to obtain their semantic tags from the feature representation of each view, in this embodiment a shared single-layer linear MLP is used to build a preset shared projection network to compute a cluster representation of each view, which is advantageous for filtering view-specific private information for feature representationsShared projection->Can be defined as:

wherein,，crepresenting the number of common categories in multiple views, and then deriving the classification probability for each sample using the Softmax function, e.g. +.>Refers to the firstmView No. 1iIndividual samples belong to a clusterjThus, the semantic tags are identified by the element in the clustered representation with the highest probability.

In summary, according to the multi-scale fusion contrast learning multi-view clustering method and system, the shared representation information is effectively captured and combined through integrating the shared multi-layer sensing network and the fusion network, so that clustering of view private information is relievedInfluence, and adopt the comparison study strategy of multiscale to reduce the comparison number of times, solved the long problem of model training time, guaranteed the uniformity of visual information on different scales simultaneously, greatly improved the clustering performance and the efficiency of model. Specifically, according to a preset multi-view datasetAcquiring the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>Helping to eliminate private information in each view, providing a higher quality feature representation for a subsequent multi-view converged network by presetting the converged network +.>Acquiring the characteristic representation->To generate an anchor view from said common feature representation>The feature representation contrast learning is carried out, the influence of private information on the multi-view clustering effect is further avoided, and the shared projection network is preset to be +.>Acquiring cluster representation of all views +.>And the anchor viewFigure->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultscThe comparison times are reduced through a multi-scale comparison learning strategy, the problem of long model training time is solved, the comparison learning efficiency is greatly improved, semantic tags of each view are obtained through characteristic representation of each view, the view specific private information is filtered, the problem of influence of the view private information on clustering is solved, and the clustering performance and efficiency of the model are greatly improved.

Referring to fig. 2, a flowchart of a multi-scale fusion contrast learning multi-view clustering method according to a second embodiment of the present invention is shown, and the multi-scale fusion contrast learning multi-view clustering method includes steps S11 to S20, wherein:

step S11: inputting a preset multi-view dataset to cause the encoder to extract a specific representation;

step S12: extracting the characteristic representations of all views by a preset shared multi-layer perception network according to a formula;

step S13: the decoder calculates and acquires reconstruction loss according to a formula;

step S14: acquiring public feature representation by a preset fusion network according to a formula, and generating the anchor view according to the public feature representation;

it should be noted that the EMFCL of the present invention generates anchor views using a converged network, thereby significantly reducing the number of comparisons between views, e.g., given inclusionVIndividual view datasetIn the case of existing methods such as MFLVC and CONAN, which build contrast learning strategies on pairwise comparisons between views, such strategies require going between viewsV(VThe 1/2 comparison significantly increases the complexity of the model, however by comparing each view with the anchor view, the EMFCL method of the present invention only needs to be performed between viewsVIn secondary comparison, it is well known that contrast learning typically uses cosine similarity or euclidean distance to measure similarity between view feature representations, so that the temporal complexity of contrast learning between views is O (Nd) _dim ) Where N is the number of samples, d _dim Is the dimension of the feature representation, and the time complexity of the contrast learning strategy in the EMFCL and the aforementioned contrast learning method is O (VNd _dim ) And O (MNd) _dim ) WhereinM=V(V-1)/2, but due to O (VND) _dim ) To be compared with O (MNd) _dim ) The EMFCL method is much lower and therefore has much lower time complexity than other methods, to further demonstrate the superiority of the present invention over learning strategies, the EMFCL of the present invention is converted to a non-fusion variant and a pairwise comparison strategy between views is employed, called EMFCL _NoFusion Counting EMFCL and EMFCL again _NoFusion Training time over all data sets and comparison with the most advanced MFLVC method of the prior art under the same experimental parameter settings, results are as follows:

the training time of the EMFCL method of the invention is obviously lower than that of the MFLVC method, and the advantages of the EMFCL method of the invention in training time are more obvious along with the increase of the number of samples or views, and the same applies to the EMFCL _NoFusion Compared with the EMFCL method, the training time of the EMFCL method is shorter on all data sets, and therefore, the fusion network designed by the invention obviously reduces the pairwise comparison times between views, and greatly accelerates the training speed of the model.

Step S15: constructing positive and negative samples in and among the views of all feature representations and anchor views;

for the firstmIndividual views, any sampleFeature representation both considered as anchor sample and anchor viewThe positive pair is formed and the characteristic representation between the anchor sample and the other samples is naturally considered to be a negative pair.

Step S16: according to the positive sample pair and the negative sample pair, performing feature representation contrast learning and calculating contrast learning loss of the feature representation according to a formula;

in the present embodiment, the following is the caseAnd->The loss between can be defined as:

wherein,and->Respectively represent the firstmThe first of the individual view and the anchor viewiA characteristic representation of the individual samples is provided,τis a temperature parameter, ++>Representing cosine similarity, the same method can be used to calculate +.>Andloss between themMalnutrition of the heart>Thus, the firstmThe overall penalty of feature representation contrast learning between individual views and anchor views can be expressed as:

wherein,Nis the number of samples, and finally results in a feature representation between all views and the anchor view according to the following formula:

and calculating the contrast learning loss.

Step S17: the method comprises the steps that a shared projection network is preset to obtain cluster representations of all views and cluster representations of anchor views according to a formula;

step S18: acquiring cluster labels of cluster representations of all views and cluster labels of cluster representations of anchor views, and constructing positive pairs and negative pairs between the cluster labels so as to perform cluster representation contrast learning;

it can be appreciated that although the consistency of the view representation is guaranteed by the feature representation contrast learning, in a practical scenario, the existence of private information in some views may still lead to erroneous semantic tags, and in order to further reduce the influence of the private information on the clustering result and obtain a more robust multi-view clustering model, the method utilizes a cluster representation contrast learning strategy to constrain the consistency.

Note that in this embodiment, the consistency of the cluster representation is constrained using XT-Bent for the first embodimentmIndividual views, their clusters representHas the following characteristics ofcA cluster constrained to represent +.>Consistent, for any clusterjIdentical cluster tag->All have the advantages ofVc-1) tag pairs. Respectively at->And->The construction is%V-1) facing sumV(c-1) negative pairs.

Step S19: calculating cluster representation contrast learning loss according to a formula;

in this embodiment, the cluster representsAnd->Contrast learning loss between->The definition is as follows:

wherein,cthe number of clusters is represented and,mandArespectively represent the firstmA view of the individual and an anchor view,τis a parameter of the temperature of the liquid,representing cosine similarity. The consistency loss of the cluster representation is defined as:

wherein the method comprises the steps of，/>As a regularization termAvoiding the situation where all samples are allocated to a single cluster, whereby the overall penalty of the method can be calculated to be defined as +.>In this example, the gradient descent method is applied to minimize +.>To update the parameters.

Step S20: acquiring a cluster result of a cluster representation comprising an anchor view;

it should be noted that, comparing the EMFCL method of the present invention with the existing multi-view data aggregation method under three index metrics of clustering precision (ACC), normalized Mutual Information (NMI) and Purity (PUR), the results are as follows:

compared with the traditional multi-view clustering method CDIMC-net, the accuracy is improved greatly, and particularly, the accuracy is improved by 37.7%, 10.7%, 21.6%, 12.3%, 18.5%, 21.2% and 12.7% respectively, the reasons are that the influence caused by view private information can be reduced due to the sharing of an MLP network, a fusion network and shared projection, in order to further verify the robustness of the EMFCL method, the clustering accuracy of the EMFCL model on four Caltech data sets is obtained to be 63.8%, 71.3%, 77.2% and 85.4%, although the increase of views brings certain noise to the data, the clustering result of the EMFCL is better and better, and the fact shows that the EMFCL is insensitive to the change of the multi-view data and shows excellent robustness.

In summary, according to the multi-scale fusion contrast learning multi-view clustering method and system, the shared representation information is effectively captured and combined through integrating the shared multi-layer sensing network and the fusion network, so that the influence of view private information on clustering is relieved, and multiple scales are adoptedThe degree contrast learning strategy reduces the comparison times, solves the problem of long model training time, ensures the consistency of visual information on different scales, and greatly improves the clustering performance and efficiency of the model. Specifically, according to a preset multi-view datasetAcquiring the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>Helping to eliminate private information in each view, providing a higher quality feature representation for a subsequent multi-view converged network by presetting the converged network +.>Acquiring the characteristic representation->To generate an anchor view from said common feature representation>The feature representation contrast learning is carried out, the influence of private information on the multi-view clustering effect is further avoided, and the shared projection network is preset to be +.>Acquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultscThe comparison times are reduced through a multi-scale comparison learning strategy, the problem of long model training time is solved, the comparison learning efficiency is greatly improved, semantic tags of each view are obtained through characteristic representation of each view, the view specific private information is filtered, the problem of influence of the view private information on clustering is solved, and the clustering performance and efficiency of the model are greatly improved.

Referring to fig. 3, a schematic structural diagram of a multi-scale fusion contrast learning multi-view clustering system according to a third embodiment of the present invention is shown, where the system includes:

a shared multi-layer aware network module 10 for providing a multi-view dataset according to a predetermined set of multi-view datasetsAcquiring the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>；

A converged network module 20 for presetting a converged networkAcquiring the characteristic representation->To generate an anchor view from said common feature representation>To perform feature representation contrast learning;

a shared projection network module 30 for presetting a shared projection networkAcquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultsc。

Further, the shared multi-layer aware network module 10 includes:

a specific representation extraction unit 101 for causing the view specific encoder to extract a specific representation of the view according to a formula;

a feature representation extraction unit 102 for extracting feature representations of all views according to a formula;

a reconstruction loss calculation unit 103 for causing the view-specific decoder to calculate the acquisition reconstruction loss according to a formula.

Further, the converged network module 20 includes:

a common feature representation extraction unit 201 for obtaining a common feature representation according to a formula and generating the anchor view;

the feature representation contrast learning unit 202 is configured to construct positive samples and negative samples in and between the views of all feature representations and the anchor view, perform feature representation contrast learning according to the positive sample pair and the negative sample pair, and calculate a contrast learning loss of the feature representation according to a formula.

Further, the shared projection network module 30 includes:

a cluster representation acquisition unit 301 for acquiring cluster representations of all views and cluster representations of anchor views according to a formula;

the cluster representation contrast learning unit 302 is configured to obtain cluster labels and construct positive and negative pairs between the cluster labels to perform cluster representation contrast learning, and calculate cluster representation contrast learning loss according to a formula;

a cluster result acquisition unit 303 for acquiring a cluster result of the cluster representation including the anchor view.

The invention also provides a computer storage medium, one or more programs are stored on the computer storage medium, and the programs realize the multi-scale fusion contrast learning multi-view clustering method when being executed by a processor.

The invention also provides computer equipment, which comprises a memory and a processor, wherein the memory is used for storing computer programs, and the processor is used for executing the computer programs stored on the memory so as to realize the multi-scale fusion contrast learning multi-view clustering method.

Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The multi-scale fusion contrast learning multi-view clustering method is characterized by comprising the following steps of:

according to a preset multiview datasetWhereinVIndicating the number of views of the multiview data, i.e. indicating that the data comprisesVObtaining the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>；

2. The multi-scale fusion contrast learning multi-view aggregation of claim 1Class method, characterized in that said data sets are according to a preset multiviewWhereinVIndicating the number of views of the multiview data, i.e. indicating that the data comprisesVObtaining the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>The method specifically comprises the following steps:

inputting the preset multi-view datasetSaid preset multiview dataset +.>IncludedVA plurality of views, each view comprisingcA cluster;

encoder with a plurality of sensorsAnd said at least one ofVThe individual views are one-to-one, so that a view-specific encoder +.>According to the formula

in acquiring the particular representation of all the viewsAfterwards, the shared multi-layer aware network is preset +.>According to the formula

Extracting feature representations of all views。

3. The multi-scale fusion contrast learning multi-view clustering method of claim 1, wherein the multi-view data set is presetWhereinVIndicating the number of views of the multiview data, i.e. indicating that the data comprisesVObtaining the preset multiview dataset +.>Specific representation of all views +.>To share the multi-layer aware network by presetting +.>Acquiring the characteristic representation of all views +.>The steps of (a) further comprise:

Calculating and obtaining reconstruction lossWhereiniRepresent the firstiSample number->Represent the firstmThe reconstruction error of the individual view angle data,and->Respectively represent the firstmIndividual view encoder and decoder, < >>Represent the firstmThe first view angle dataiA number of samples of the sample were taken,Nrepresenting the number of samples contained in each view angle data,Vview number representing multi-view dataThe quantity, i.e. indicating that the data comprisesVAnd at least one view is included in the sample at a viewing angle.

4. The multi-scale fusion contrast learning multi-view clustering method according to claim 1, wherein the clustering is performed through a preset fusion networkAcquiring the characteristic representation->To generate an anchor view from said common feature representation>The step of performing feature representation contrast learning includes:

5. The multi-scale fusion contrast learning multi-view clustering method according to claim 1, wherein the clustering is performed through a preset fusion networkAcquisition ofThe characteristic represents->To generate an anchor view from said common feature representation>The step of performing feature representation contrast learning further comprises:

6. The multi-scale fusion contrast learning multi-view clustering method according to claim 1, wherein the sharing projection network is presetAcquiring cluster representation of all views +.>And the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to perform +.A. according to the cluster representation of all views>And the anchor view->Cluster representation +.>Obtaining cluster resultscThe method comprises the following steps:

Acquiring cluster representations of all views；

The preset shared projection networkAnd according to the formula

Acquiring the anchor viewCluster representation +.>；

7. The multi-scale fusion contrast learning multi-view clustering method of claim 6, wherein the cluster representation according to all viewsAnd Anchor view->Cluster representation +.>Cluster representation contrast learning is performed to obtain a view including anchor +.>Cluster representation +.>Cluster results of (a)cThe method comprises the following steps:

acquiring cluster representations of all viewsCluster tag of->And the anchor view->Cluster representation +.>Cluster tags of (2)；

according to the formula

Computing cluster representation contrast learning lossWherein/>Representation cluster representation->And->The contrast between them is learned against the loss,the whole acts as a regularization term that avoids all samples being assigned to a single cluster.

8. A multi-scale fusion contrast learning multi-view clustering system, comprising:

a shared multi-layer perception network module for presetting multi-view data setAcquiring the preset multiview dataset +.>Specific representation of all views +.>To share a multi-layer aware network by provisioningAcquiring the characteristic representation of all views +.>；

a shared projection network module for presetting the shared projection networkAcquiring cluster representations of all viewsAnd the anchor view->Cluster representation +.>To perform cluster representation contrast learning and to represent the cluster according to all viewsAnd the anchor view->Cluster representation +.>Obtaining cluster resultsc。

9. A storage medium, comprising: the storage medium stores one or more programs which, when executed by a processor, implement the multi-scale fusion contrast learning multi-view clustering method of any one of claims 1-7.

10. A computer device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is configured to implement the multi-scale fusion contrast learning multi-view clustering method of any one of claims 1-7 when executing the computer program stored on the memory.