CN117725440A

CN117725440A - Multi-view data clustering method and device, computer equipment and storage medium

Info

Publication number: CN117725440A
Application number: CN202311665044.1A
Authority: CN
Inventors: 李钦; 杨耿; 赖红
Original assignee: Shenzhen Institute of Information Technology
Current assignee: Shenzhen Institute of Information Technology
Priority date: 2023-12-04
Filing date: 2023-12-04
Publication date: 2024-03-19

Abstract

The application relates to the technical field of data processing, and discloses a multi-view data clustering method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a data set to be classified of a target object under a plurality of view angles; extracting features of the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified; calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified; calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified; splicing the feature mapping indication matrixes to obtain a real-value class indication matrix; based on the real-value class indication matrix, determining the clustering result of the data set to be classified under a plurality of view angles, and improving the accuracy of the multi-view angle data clustering result.

Description

Multi-view data clustering method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a multi-view data clustering method, a device, a computer device, and a storage medium.

Background

With the rapid development of computer technology, data is growing explosively, and it is very meaningful to extract useful information from massive amounts of data. Cluster analysis refers to an analysis process that groups a collection of physical or abstract objects into multiple classes that are composed of similar objects. The goal of cluster analysis is to collect data on a similar basis to classify. Clustering is derived from many fields including mathematics, computer science, statistics, biology and economics. In different fields of application, a number of clustering techniques have been developed, which are used to describe data, measure similarities between different data sources, and classify the data sources into different clusters.

Data in reality often has multiple perspectives. For example, the webpage data contains both picture information and text information; the video data contains both audio information and picture information. The fundamental task of multi-view learning is to improve learning performance by using complementary information between different views. Multi-view clustering is a basic task of multi-view learning, and most of traditional multi-view clustering methods are based on spectral clustering, and the Euclidean distance is the main factor when similarity of sample points under different view representations is measured. However, the intrinsic representation of the sample points in the data is often in different subspaces, and the euclidean distance of the sample points under the high-dimensional representation of the sample points cannot effectively reflect the structural information of the data, so that the accuracy of the clustering result is low.

Disclosure of Invention

The application provides a multi-view data clustering method, a multi-view data clustering device, computer equipment and a storage medium, and accuracy of multi-view data clustering results is improved.

In a first aspect, a multi-view data clustering method is provided, including:

acquiring a data set to be classified of a target object under a plurality of view angles;

extracting features of the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified;

calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified;

calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified;

splicing the characteristic mapping indication matrixes to obtain the real-value class indication matrix;

and determining clustering results of the data sets to be classified under a plurality of view angles based on the real-value class indication matrix.

In a second aspect, there is provided a multi-view data clustering apparatus, comprising:

the acquisition module is used for acquiring a data set to be classified of the target object under a plurality of view angles;

The extraction module is used for carrying out feature extraction on the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified;

the first calculation module is used for calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified;

the second calculation module is used for calculating the relation characteristic information between every two relation characteristic matrixes to obtain a characteristic mapping indication matrix corresponding to each data set to be classified;

the splicing module is used for splicing the characteristic mapping indication matrixes to obtain the real-value class indication matrix;

and the determining module is used for determining clustering results of the data sets to be classified under a plurality of view angles based on the real-value class indication matrix.

Optionally, in some embodiments of the present application, the splicing module further includes:

a first determining sub-module for determining a symmetric positive definite matrix based on the feature mapping indication matrix;

the decomposition sub-module is used for decomposing the symmetrical positive definite matrix to obtain an upper triangular matrix;

the first computing sub-module is used for computing the upper triangular matrix and the feature mapping indication matrix to obtain an orthogonal representation indication matrix corresponding to the feature mapping indication matrix;

And the splicing sub-module is used for splicing the orthogonal representation indication matrixes to obtain the real-value class indication matrix.

Optionally, in some embodiments of the present application, the decomposition sub-module further includes:

and the decomposition unit is used for performing square root decomposition on the symmetrical positive definite matrix based on a preset decomposition sequence to obtain the upper triangular matrix.

Optionally, in some embodiments of the present application, the first computing module further includes:

the second calculation sub-module is used for carrying out Euclidean distance calculation on each two data characteristic information in each data set to be classified to obtain a distance difference matrix;

and the second determining submodule is used for taking the matrix difference matrix as the relation characteristic matrix.

Optionally, in some embodiments of the present application, further comprising:

the third computing sub-module is used for computing a distance weight matrix between every two data characteristic information in each data set to be classified;

and the fourth computing sub-module is used for computing the distance weight matrix and the distance difference matrix to obtain the relation feature matrix.

Optionally, in some embodiments of the present application, the third calculation sub-module further includes:

The computing subunit is used for inputting each two data characteristic information in the data set to be classified into a pre-trained twin network to perform similarity computation to obtain similarity values corresponding to the two data characteristic information;

the first determining subunit is configured to take the similarity value as a distance weight between the two data feature information if the similarity value is greater than a similarity threshold;

the second determining subunit is configured to obtain a preset value if the similarity value is smaller than or equal to the similarity threshold, and take the preset value as a distance weight between two pieces of data feature information;

and the third determining subunit is used for taking each distance weight as the distance weight matrix.

Optionally, in some embodiments of the present application, the determining module further includes:

the transformation submodule is used for carrying out orthogonal transformation on the real-value class indication matrix to obtain a discrete class indication matrix;

and the third determining submodule is used for determining clustering results of the data sets to be classified under a plurality of view angles based on the discrete category indication matrix.

In a third aspect, a computer device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the multi-view data clustering method described above when executing the computer program.

In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the multi-view data clustering method described above.

The application provides a multi-view data clustering method, a multi-view data clustering device, computer equipment and a storage medium, wherein a data set to be classified of a target object under a plurality of view angles is obtained; extracting features of the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified; calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified; calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified; splicing the feature mapping indication matrixes to obtain a real-value class indication matrix; and determining clustering results of the data sets to be classified under a plurality of view angles based on the real-value class indication matrix. In the multi-view data clustering scheme provided by the application, the characteristic mapping indication matrix corresponding to each data set to be classified is obtained by calculating the relation characteristic information belonging to the same target object in the relation characteristic matrix in each data set to be classified, then the characteristic mapping indication matrices are spliced to obtain a real-value type indication matrix, and a clustering result is obtained based on the real-value type indication matrix. Therefore, the real-value class indication matrix is obtained through the feature mapping indication matrix, the to-be-classified data sets under the multiple view angles can be fused through the real-value class indication matrix, hidden information in the to-be-classified data sets in the multiple view angles is fully utilized, and further, the clustering result is determined based on the real-value class indication matrix, so that the accuracy of the multi-view angle data clustering result is facilitated.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application environment diagram of a multi-view data clustering method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a multi-view data clustering method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of feature extraction of data to be classified according to an embodiment of the present application;

fig. 4 is a block diagram of a multi-view data clustering device according to an embodiment of the present application;

fig. 5 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

The multi-view data clustering method provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein the computer equipment 110 communicates with the user 120 through the network 130, and the user can operate the computer equipment 110. The computer device 110 may obtain a data set to be classified of the target object under a plurality of perspectives; extracting features of the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified; calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified; calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified; splicing the feature mapping indication matrixes to obtain a real-value class indication matrix; based on the real-value class indication matrix, determining clustering results of the data sets to be classified under a plurality of view angles, and displaying the clustering results of the data sets to be classified through the computer equipment 110. According to the method, the characteristic mapping indication matrix corresponding to each data set to be classified is obtained by calculating the relation characteristic information belonging to the same target object in the relation characteristic matrix in each data set to be classified, then the characteristic mapping indication matrices are spliced to obtain a real-value class indication matrix, and a clustering result is obtained based on the real-value class indication matrix. Therefore, the real-value class indication matrix is obtained through the feature mapping indication matrix, the to-be-classified data sets under the multiple view angles can be fused through the real-value class indication matrix, hidden information in the to-be-classified data sets in the multiple view angles is fully utilized, and further, the clustering result is determined based on the real-value class indication matrix, so that the accuracy of the multi-view angle data clustering result is facilitated. The computer device 110 may be, but is not limited to, various smartphones 110-1, tablet computers 110-2, and notebook computers 110-3. The present invention will be described in detail with reference to specific examples.

Referring to fig. 2, fig. 2 is a flow chart of a multi-view data clustering method according to an embodiment of the present invention, where the method can be applied to a terminal or a server, and the embodiment is exemplified by a server. The multi-view data clustering method comprises the following steps:

101: and acquiring the data set to be classified of the target object under a plurality of view angles.

With the continuous development of data acquisition technology, multi-view data are increasingly appearing in various application fields, such as video analysis, social media analysis, medical image processing and the like. With the aid of computer and internet technology, data can often be collected from different fields or obtained from various feature extractors. Therefore, the same data have different characteristics, and can be described from multiple viewpoints, so that a data set to be classified of a target object under multiple viewpoints can be acquired. For example, assuming that the target object is a cup image and there are 3 cup images, the view angles may include a three primary color (RedGreenBlue, RGB) feature, a local binary (Local Binary Pattern, LBP) feature, a directional gradient histogram (Histogram of Oriented Gradient, HOG) feature, and a Scale-invariant feature transform (Scale-invariant feature transform, SIFT), then the exemplary data sets to be classified corresponding at the view angles of the three primary color (RedGreenBlue, RGB) feature include three primary color feature images of the first cup image, three primary color feature images of the second cup image, and three primary color images of the third cup image. For another example, the target object may be a web security situation assessment, and the data set to be classified corresponding to each view angle may be obtained from the view angles of the web log, the sensor, the web crawler data, and the like, where each data set to be classified includes security situation information under the view angle of the web log, security situation information under the view angle of the sensor, and web crawler data under the view angle of the web crawler data.

For example, given x= { x ⁽¹⁾ ，x ⁽²⁾ ，...x ^(m) The data set containing m views, x ⁽¹⁾ ，x ⁽²⁾ ，...x ^(m) Representing that the data set comprises m views of the data set to be classified,data representing the v-th viewing angle, d _v Representing the feature dimension corresponding to the data to be classified in the data set to be classified under the v-th view angle, wherein n is the preset number of target objects, which is a positive integer greater than 0,>representing the ith data to be classified in the data set to be classified under the v-th view angle, wherein i is a positive integer from 1 to nJ is a positive integer from 1 to n.

102: and extracting the characteristics of the data to be classified in each data set to be classified to obtain the data characteristic information corresponding to the data to be classified.

And inputting all data to be classified in the data sets to be classified into the sub-neural network corresponding to each data set to be classified for feature extraction, so as to obtain data feature information corresponding to the data to be classified.

As shown in fig. 1, it is assumed that a data set to be classified including v views, i.e., x ⁽¹⁾ ，x ⁽²⁾ ，...x ^(m) Wherein x is ⁽¹⁾ X is the data set to be classified under the 1 st view angle ⁽²⁾ X is the data set to be classified under the 2 nd view angle ^(m) For the data set to be classified under the mth view angle,for the ith data to be classified in the data set to be classified under the v-th view angle, the +. >The input is sent into a v-th sub-neural network corresponding to the data set to be classified under the v-th visual angle to obtain output +.>

103: and calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified.

The similarity between every two data characteristic information in the relation characteristic matrix is represented in European space, and the similarity represents the similarity between two data to be classified in the same data set to be classified.

And (3) calculating the characteristic information of each two data in each data set to be classified by adopting a cosine similarity algorithm to obtain a corresponding relation characteristic matrix of each data set to be classified.

And calculating the characteristic information of every two data in one data set to be classified to obtain a distance difference matrix, so as to determine a relation characteristic matrix. Namely, the step of "calculating the feature information of each two data in each data set to be classified to obtain a relationship feature matrix corresponding to each data set to be classified" includes:

(10) Performing Euclidean distance calculation on each two data characteristic information in each data set to be classified to obtain a distance difference matrix;

(11) And taking the matrix difference matrix as a relation characteristic matrix.

Since the similarity between every two data characteristic information in the data set to be classified is constant over different projection space popularity. Therefore, a distance difference matrix corresponding to the data set to be classified can be calculated according to the following formula (1):

in the method, in the process of the invention,a distance difference matrix corresponding to the data set to be classified of the v-th visual angle, namely a relation characteristic matrix, wherein the size of the distance difference matrix is i x i,/and the relation characteristic matrix is a plurality of the data sets to be classified of the v-th visual angle>Inputting the characteristic vector output after the v sub-neural network corresponding to the v to-be-classified data set of the v view angle for the i to-be-classified data in the v to-be-classified data set of the v view angle, and +.>And inputting the characteristic vector output after the v sub-neural network corresponding to the data set to be classified in the v view as the j data to be classified in the data set to be classified in the v view.

After determining the distance difference matrix, a distance weight matrix between every two data characteristic information in each data set to be classified can be calculated, and the distance weight matrix and the distance difference matrix are based. That is, the step of performing the euclidean distance calculation between every two data feature information in each data set to be classified to obtain a distance difference matrix further includes:

(20) Calculating a distance weight matrix between every two data characteristic information in each data set to be classified;

a clustering algorithm can be adopted to calculate a distance weight matrix between every two data characteristic information in each data set to be classified

(21) And calculating the distance weight matrix and the distance difference matrix to obtain a relation feature matrix.

The relationship feature matrix can be calculated according to the following formula (2):

in the method, in the process of the invention,for the corresponding relation feature matrix of the data set to be classified of the v-th visual angle, < >>A distance difference matrix corresponding to the data set to be classified for the v-th visual angle, wherein the size of the distance difference matrix is i #,/>Distance weight matrix corresponding to the data set to be classified of the v-th view angle, wherein w is the distance weight matrix _ij The element of the data set is the characteristic vector +.A characteristic vector +.output after the ith data set to be classified in the v view is input to the v sub-neural network corresponding to the data set to be classified in the v view>Outputting the data to be classified of the v-th view after inputting the j-th data to be classified in the data set to be classified of the v-th view into a v-th sub-neural network corresponding to the data set to be classified of the v-th viewFeature vector->The distance between them.

And calculating every two data characteristic information in the data set to be classified through the twin network to obtain similarity values corresponding to the two data characteristic information, so that a distance weight matrix is determined based on the similarity values. That is, the step of "calculating a distance weight matrix between every two pieces of data feature information in each data set to be classified" includes:

(30) Inputting every two data characteristic information in the data set to be classified into a pretrained twin network to perform similarity calculation to obtain similarity values corresponding to the two data characteristic information;

(31) If the similarity value is smaller than the similarity threshold value, the similarity value is used as the distance weight between the two data characteristic information;

(32) If the similarity value is greater than or equal to the similarity threshold value, acquiring a preset value, and taking the preset value as a distance weight between two data characteristic information;

(33) And taking each distance weight as a distance weight matrix.

If the similarity value is smaller than the similarity threshold, the connection between the two data characteristic information is represented, namelyBased on this, the similarity value is used as a distance weight between the two data characteristic information.

Wherein, the distance weight matrix can be calculated according to the following formula (3):

in the method, in the process of the invention,distance weight matrix corresponding to data set to be classified for the v-th view angleW in the heavy matrix _ij The element of the data set is the characteristic vector +.A characteristic vector +.output after the ith data set to be classified in the v view is input to the v sub-neural network corresponding to the data set to be classified in the v view>Feature vector +.10 outputted after inputting the j-th data to be classified in the data set to be classified in the v-th view into the v-th sub-neural network corresponding to the j-th data to be classified in the v-th view >Distance weight between->Output of a sub-neural network after inputting the ith data to be classified into the twin network in the data set to be classified under the v-th view angle +.>And the output delta of the other sub-neural network after the j-th data to be classified in the data set to be classified under the v-th view angle is input into the twin network is a constant parameter, and the constant parameter is determined according to an actual application scene and is not particularly limited.

104: and calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified.

The feature mapping indication matrix can represent similarity between the relationship feature information of the same position in the relationship feature matrices of two different angles of view, and the similarity is used for representing similarity between the data to be classified in the data sets to be classified under different angles of view, that is, the feature mapping indication matrix contains correlation between every two data to be classified in each data set to be classified and correlation between the data to be classified in each data set to be classified under two angles of view.

The relationship feature information between every two relationship feature matrixes can be mapped to European space to obtain feature mapping indication matrixes corresponding to each data set to be classified. That is, the feature map instruction matrix may be expressed according to the following formula (4):

wherein F is ^(v) A feature mapping indication matrix corresponding to the data set to be classified in the v-th view angle, wherein the size of the feature mapping indication matrix is n x d, n is a preset number of target objects, d is feature dimension of feature mapping indication information in the feature mapping indication matrix,for the ith relational feature information in the relational feature matrix of the v-th view angle,/th relational feature information>And v is a positive integer from 1 to m, p is a positive integer from 1 to m, and m is the number of view angles.

Wherein F is ^(v) Data of (i) th rowRepresenting the representation of the ith data to be classified of the v-th view in the feature map indicating matrix, and thus the real-valued class indicating matrix F ^v Can be regarded as a set of feature vectors corresponding to the data to be classified in a set of data to be classified from different perspectives, each feature vector corresponding to one feature map indication vector in the feature map indication matrix. Therefore, the real-value class indication matrix can capture the characteristic information of the data to be classified under different visual angles, F ^(v) The feature vectors in the image data set can present a correlation structure and potential sharing features among the data, which is beneficial to improving the clustering accuracy of the image data sets under multiple view anglesSex.

105: and splicing the feature mapping indication matrixes to obtain a real-value class indication matrix.

Wherein the size of the real-valued class indication matrix is n×c, where n is the number of target objects, i.e. from 1 to n, and C is the class, i.e. each column in the real-valued class matrix represents a class of one target object.

A symmetric positive definite matrix may be determined based on the feature map indication matrix and a real-valued class indication matrix may be determined based on the symmetric positive definite matrix and the feature map indication matrix. That is, the step of "splicing the feature mapping indication matrices to obtain a real value class indication matrix" includes:

(40) Determining a symmetric positive definite matrix based on the feature mapping indication matrix;

(41) Decomposing the symmetrical positive definite matrix to obtain an upper triangular matrix;

(42) Calculating the upper triangular matrix and the feature mapping indication matrix to obtain an orthogonal representation indication matrix corresponding to the feature mapping indication matrix;

(43) And splicing the orthogonal representation indication matrixes to obtain a real-value class indication matrix.

Wherein according to the following formula (5)

(F ^(v) ) ^T F ^(v) ＝I (5)

Wherein F is ^(v) A feature mapping indication matrix corresponding to the data set to be classified for the v-th view, (F) ^(v) ) ^T F ^(v) The matrix is positive and definite for the symmetry of the v-th visual angle, and I is an identity matrix.

The symmetric positive definite matrix can be decomposed by adopting a singular value decomposition method to obtain an upper triangular matrix.

The orthogonal representation indication matrix is calculated according to the following formula (6):

in the method, in the process of the invention,orthogonal representation indication matrix corresponding to data set to be classified and representing v-th view angle, F ^(v) A feature mapping indication matrix corresponding to the data set to be classified for the v-th view, (L) ^(v) ) ^T Representing the upper triangular matrix.

The symmetric positive definite matrix may be decomposed using a square root decomposition method. That is, the step of decomposing the symmetric positive definite matrix to obtain an upper triangular matrix includes:

(50) And performing square root decomposition on the symmetrical positive definite matrix based on a preset decomposition sequence to obtain an upper triangular matrix.

The preset decomposition sequence may be from top to bottom, from left to right, that is, from top to bottom, and square root decomposition is performed on the symmetric positive definite matrix according to the decomposition sequence from top to bottom, so as to obtain the upper triangular matrix.

Due to (F) ^(v) ) ^T F ^(v) Is a symmetric positive definite matrix, so there is a relation (F ^(v) ) ^T F ^(v) ＝L ^(v) (L ^(v) ) ^T At this time (L ^(v) ) ^-1 Is a lower triangular matrix ((L) ^(v) ) ^-1 ) ^T Is an upper triangular matrix and therefore has the following relationship:

(F ^(v) ) ^T F ^(v) ＝(L ^(v) ) ^-1 (F ^(v) ) ^T F ^(v) ((L ^(v) ) ^-1 ) ^T ＝(L ^(v) ) ^-1 LL ^T ((L ^(v) ) ^-1 ) ^T ＝L ^-1 LL ^T (L ^-1 ) ^T

wherein L is a lower triangular matrix.

Thus, the first and second substrates are bonded together,is a representation with orthogonal constraints.

106: and determining clustering results of the data sets to be classified under a plurality of view angles based on the real-value class indication matrix.

Because each column in the real-value class matrix represents the class of one target object, the class corresponding to the maximum class probability can be selected from each column as the class of the target object, so that clustering is realized, and the clustering result of the data set to be classified under a plurality of view angles is determined.

The real-value class indication matrix can be subjected to orthogonal transformation, so that a discrete class indication matrix is obtained, and clustering results of the data sets to be classified under multiple view angles are determined based on the discrete class indication matrix. That is, the step of determining a clustering result of the data set to be classified under a plurality of view angles based on the real-value class indication matrix includes:

(60) Performing orthogonal transformation on the real-value class indication matrix to obtain a discrete class indication matrix;

(61) Based on the discrete category indication matrix, determining clustering results of the data sets to be classified under a plurality of view angles.

The real-value class indication matrix can be subjected to orthogonal transformation through an orthogonal layer formed by a neural network to obtain a discrete class indication matrix, namely orthogonal constraint can be applied to the real-value class indication matrix, the position corresponding to the value with the largest class probability in each column is mapped to be 1, and the rest positions are mapped to be 0, so that the clustering label of each column can be obtained, and the clustering result of the data set to be classified under multiple view angles is determined based on the discrete class indication matrix.

The above is a multi-view data clustering flow of the present application.

As above, the present application provides a multi-view data clustering method, apparatus, computer device, and storage medium, by acquiring a data set to be classified of a target object under multiple views; extracting features of the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified; calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified; calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified; splicing the feature mapping indication matrixes to obtain a real-value class indication matrix; and determining clustering results of the data sets to be classified under a plurality of view angles based on the real-value class indication matrix. In the multi-view data clustering scheme provided by the application, the characteristic mapping indication matrix corresponding to each data set to be classified is obtained by calculating the relation characteristic information belonging to the same target object in the relation characteristic matrix in each data set to be classified, then the characteristic mapping indication matrices are spliced to obtain a real-value type indication matrix, and a clustering result is obtained based on the real-value type indication matrix. Therefore, the real-value class indication matrix is obtained through the feature mapping indication matrix, the to-be-classified data sets under the multiple view angles can be fused through the real-value class indication matrix, hidden information in the to-be-classified data sets in the multiple view angles is fully utilized, and further, the clustering result is determined based on the real-value class indication matrix, so that the accuracy of the multi-view angle data clustering result is facilitated.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, a multi-view data clustering device is provided, where the multi-view data clustering device corresponds to the multi-view data clustering method in the above embodiment one by one. Referring to fig. 4, the multi-view data clustering device includes an obtaining module 201, an extracting module 202, a first calculating module 203, a second calculating module 204, a splicing module 205 and a determining module 206, and each functional module is described in detail as follows:

The splicing module is used for splicing the characteristic mapping indication matrixes to obtain a real-value class indication matrix;

Optionally, in some embodiments, the splice module further comprises:

the first determining submodule is used for determining a symmetrical positive definite matrix based on the feature mapping indication matrix;

and the splicing sub-module is used for splicing the orthogonal representation indication matrixes to obtain a real-value class indication matrix.

Optionally, in some embodiments, the decomposition sub-module further comprises:

and the decomposition unit is used for performing square root decomposition on the symmetrical positive definite matrix based on a preset decomposition sequence to obtain an upper triangular matrix.

Optionally, in some embodiments, the first computing module further comprises:

the second calculation sub-module is used for carrying out Euclidean distance calculation on every two data characteristic information in each data set to be classified to obtain a distance difference matrix;

And the second determining submodule is used for taking the matrix difference matrix as a relation characteristic matrix.

Optionally, in some embodiments, further comprising:

and the fourth computing sub-module is used for computing the distance weight matrix and the distance difference matrix to obtain a relation feature matrix.

Optionally, in some embodiments, the third computing sub-module further comprises:

the computing subunit is used for inputting every two data characteristic information in the data set to be classified into the pretrained twin network to perform similarity computation to obtain similarity values corresponding to the two data characteristic information;

the first determining subunit is configured to, if the similarity value is greater than the similarity threshold, use the similarity value as a distance weight between the two data feature information;

the second determining subunit is configured to obtain a preset value if the similarity value is less than or equal to the similarity threshold, and use the preset value as a distance weight between two data feature information;

and the third determining subunit is used for taking each distance weight as a distance weight matrix.

Optionally, in some embodiments, the determining module further comprises:

In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external server via a network connection. The computer program is executed by a processor to implement the functions or steps of a multi-view data clustering method.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

Acquiring a data set to be classified of a target object under a plurality of view angles; extracting features of the data to be classified in each data set to be classified to obtain data feature information corresponding to the data to be classified; calculating the characteristic information of each two data in each data set to be classified to obtain a corresponding relation characteristic matrix of each data set to be classified; calculating the relation characteristic information between every two relation characteristic matrixes to obtain characteristic mapping indication matrixes corresponding to each data set to be classified; splicing the feature mapping indication matrixes to obtain a real-value class indication matrix; and determining clustering results of the data sets to be classified under a plurality of view angles based on the real-value class indication matrix.

In this embodiment, the feature mapping indication matrix corresponding to each data set to be classified is obtained by calculating the relationship feature information belonging to the same target object in the relationship feature matrix in each data set to be classified, and then the feature mapping indication matrices are spliced to obtain a real-value class indication matrix, and a clustering result is obtained based on the real-value class indication matrix. Therefore, the real-value class indication matrix is obtained through the feature mapping indication matrix, the to-be-classified data sets under the multiple view angles can be fused through the real-value class indication matrix, hidden information in the to-be-classified data sets in the multiple view angles is fully utilized, and further, the clustering result is determined based on the real-value class indication matrix, so that the accuracy of the multi-view angle data clustering result is facilitated.

In one embodiment, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which when executed by a processor performs the steps of:

It should be noted that, the functions or steps implemented by the computer readable storage medium or the computer device may correspond to the relevant descriptions of the server side and the client side in the foregoing method embodiments, and are not described herein for avoiding repetition.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A multi-view data clustering method, comprising:

2. The multi-view data clustering method according to claim 1, wherein the splicing the feature mapping indication matrices to obtain the real-valued class indication matrix comprises:

determining a symmetric positive definite matrix based on the feature mapping indication matrix;

decomposing the symmetrical positive definite matrix to obtain an upper triangular matrix;

calculating the upper triangular matrix and the feature mapping indication matrix to obtain an orthogonal representation indication matrix corresponding to the feature mapping indication matrix;

and splicing the orthogonal representation indication matrixes to obtain the real-value class indication matrix.

3. The multi-view data clustering method according to claim 2, wherein the decomposing the symmetric positive definite matrix to obtain an upper triangular matrix comprises:

and performing square root decomposition on the symmetrical positive definite matrix based on a preset decomposition sequence to obtain the upper triangular matrix.

4. The multi-view data clustering method according to claim 1, wherein the calculating the data feature information of each two of the data sets to be classified to obtain a corresponding relationship feature matrix of each data set to be classified includes:

performing Euclidean distance calculation on each two data characteristic information in each data set to be classified to obtain a distance difference matrix;

and taking the matrix difference matrix as the relation characteristic matrix.

5. The multi-view data clustering method according to claim 4, wherein the performing euclidean distance calculation between each two data feature information in each data set to be classified to obtain a distance difference matrix further comprises:

calculating a distance weight matrix between every two data characteristic information in each data set to be classified;

And calculating the distance weight matrix and the distance difference matrix to obtain the relation feature matrix.

6. The multi-view data clustering method according to claim 5, wherein said calculating a distance weight matrix between each two of the data feature information in each of the data sets to be classified comprises:

inputting each two data characteristic information in the data set to be classified into a pretrained twin network to perform similarity calculation to obtain similarity values corresponding to the two data characteristic information;

if the similarity value is larger than a similarity threshold value, the similarity value is used as a distance weight between the two data characteristic information;

if the similarity value is smaller than or equal to the similarity threshold value, acquiring a preset value, and taking the preset value as a distance weight between the two data characteristic information;

and taking each distance weight as the distance weight matrix.

7. The multi-view data clustering method according to claim 1, wherein the determining the clustering result of the data set to be classified under a plurality of views based on the real-valued class indication matrix comprises:

Performing orthogonal transformation on the real-value class indication matrix to obtain a discrete class indication matrix;

and determining clustering results of the data sets to be classified under a plurality of view angles based on the discrete class indication matrix.

8. A multi-view data clustering device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the multi-view data clustering method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the multi-view data clustering method according to any one of claims 1 to 7.