CN113255720A - Multi-view clustering method and system based on hierarchical graph pooling - Google Patents
Multi-view clustering method and system based on hierarchical graph pooling Download PDFInfo
- Publication number
- CN113255720A CN113255720A CN202110393842.8A CN202110393842A CN113255720A CN 113255720 A CN113255720 A CN 113255720A CN 202110393842 A CN202110393842 A CN 202110393842A CN 113255720 A CN113255720 A CN 113255720A
- Authority
- CN
- China
- Prior art keywords
- graph
- view
- matrix
- clustering
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 111
- 238000011176 pooling Methods 0.000 title claims abstract description 59
- 239000011159 matrix material Substances 0.000 claims abstract description 283
- 239000013598 vector Substances 0.000 claims abstract description 73
- 230000003595 spectral effect Effects 0.000 claims abstract description 38
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 30
- 238000010276 construction Methods 0.000 claims description 14
- 238000007635 classification algorithm Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 6
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000000007 visual effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000013507 mapping Methods 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 101100391182 Dictyostelium discoideum forI gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-view clustering method based on hierarchical graph pooling, which comprises the following steps of: dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views; extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration; and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors. The method has the advantages that the multi-view characteristics of the data to be processed are fully utilized, and the clustering information of all the original views can be comprehensively contained. Disclosed is a multi-view clustering system based on hierarchical graph pooling, comprising: the system comprises a graph building module, a clustering information calculation and extraction module and a multi-view fusion module. The method has the beneficial effect of improving the clustering effect.
Description
Technical Field
The present invention relates to the field of clustering and classification. More specifically, the present invention relates to a multi-view clustering method and system based on hierarchical graph pooling.
Background
As data gathering technologies develop, more and more social network data sets have the characteristic of multiple perspectives (also called multi-modal). Multi-view learning refers to data collected from different domains or obtained from different feature extractors and exhibiting heterogeneous characteristics. In social network analysis, a node represents an object and an edge represents a connection. In the multi-view social network data set, there are various association relationships between nodes, wherein a network constructed by 1 association is 1 view. The multi-view learning improves the generalization performance of the model by modeling the learning mode of each view and then carrying out view fusion.
At present, the following mainstream processing methods exist for clustering and classifying multi-view social network data:
(1) and splicing the multi-view social network data into single-view data, and then clustering and classifying. The method splices the features in all the views together to form single-view data containing all the features of the original views. Then clustering the images by using a clustering algorithm such as k-means and the like.
(2) And clustering and classifying after the multi-view data are fused. The method utilizes a multi-view fusion method such as multi-view spectral clustering or multi-view metric learning to obtain a fused feature matrix, namely the representation of comprehensive multi-view data, wherein each data corresponds to one feature vector, and the number of samples is kept unchanged in the fusion process. And clustering the fused feature matrix by using mainstream clustering methods such as k-means to obtain feature vectors of the clustering centers of all categories.
In the face of the clustering and classifying problems of the multi-view social network data, the mainstream method has a good improvement effect on a certain specific problem. However, certain problems exist, the data characteristics are relatively dependent, and the robustness and the application scene of the model are relatively limited.
First, the method of stitching multi-view social network data into a single view does not utilize the multi-view characteristics of the data itself, and overfitting is easily caused.
Secondly, the method for clustering after multi-view fusion does not make special improvement on clustering and classification tasks, namely, structural information and characteristic information in each view are not well targeted, clustering properties in data are not learned, and clustering is only carried out according to the characteristic information after view fusion. Because the structural information of each visual angle is different before clustering, the clustering condition is not necessarily the same. In the fusion process, the method can only extract clustering information from the fused views, and the clustering information in each original view is discarded, so that the clustering and classification task effects are influenced.
Disclosure of Invention
An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.
The invention also aims to provide a multi-view clustering method based on hierarchical graph pooling, which adopts a multi-view learning method to learn for each view and finally fuses into a feature matrix, and utilizes the multi-view characteristics of data, so that the fused views can comprehensively contain the clustering information of the original views.
The multi-view clustering system based on the layered graph pooling is simple in model structure, fast in calculation and outstanding in clustering effect.
To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided a hierarchical graph pooling-based multi-view clustering method, comprising the steps of:
dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacent matrix of a representation graph structure, a feature matrix of representation graph feature information and a graph Laplacian matrix;
extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration;
and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors.
Preferably, the method for constructing the view comprises the following steps: each data point is taken as a node of the graph, the data points are characterized by the eigenvectors of the data points, the edges of the graph are constructed according to the association between the data points, and a graph adjacency matrix A is obtained(i)。
It is preferable that the first and second liquid crystal layers are formed of,
the method for constructing the edge of the graph further comprises a K nearest neighbor algorithm, and comprises the following steps:
dividing the feature vector of the data points, constructing a view for the feature of each divided part according to the K-nearest neighbor composition method to obtain a graph adjacency matrix A ^ ((i)) of the view, wherein,
the feature vector of the data point is regarded as a coordinate in an n-dimensional space, the distance value of each data point and other nodes is calculated according to a distance calculation method in an n-dimensional Euclidean space, then each node is traversed, the distances between other nodes are ranked according to the magnitude, for a given neighbor number k, if the other nodes are within the first k neighbor numbers of the node, an edge is formed between the other nodes and the neighbor number k, and the neighbor number k is a hyper-parameter.
It is preferable that the first and second liquid crystal layers are formed of,
method of constructing a feature matrix from the inherent association between data points: selecting the value of the original data structure characteristic of the data point as the characteristic vector of the node, and then splicing the values according to the rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)Forming an initial data set;
the method for constructing the feature matrix according to the K nearest neighbor algorithm comprises the following steps: splicing the eigenvectors of the data points in each view i according to rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)。
Preferably, the clustering information for extracting each view i is calculated as: and iterating by adopting a hierarchical graph pooling method to obtain clustering information of the hierarchical graph, outputting graph data of the clustered coarsened graph after each iteration, wherein the graph data comprises a characteristic matrix and an adjacent matrix and is used as input of the next iteration, and the iteration number m and the category number k in each iteration are preset hyper-parameters.
Preferably, the calculating and extracting the clustering information of each view i specifically includes the following steps:
step one, setting an algorithm for extracting a clustering center, applying the algorithm to graph data of an input graph, and taking the obtained clustering center as a node of an output graph to obtain a node corresponding relation between the input graph and the output graph: by the set algorithm and the characteristic matrix and the adjacency matrix of the input graph, the nodes of the input graph are divided into different clusters, the characteristic vectors of the same class of nodes in the input graph are comprehensively represented by the characteristic vector of one node in the coarsened graph, and the node is a super node and is regarded as a clustering center;
step two, using distribution matrixAnd (3) representing the node corresponding relation between the input graph and the output graph: if the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is inWhere the value of (1) is, if not in a certain class, then the matrix is assignedThe middle corresponding position is 0;
step three, obtaining the adjacency matrix of the coarsened graph through the adjacency matrix and the distribution matrix of the input graphObtaining the structure of the coarsening graph;
aggregating the feature vectors of the same type of nodes in the input graph by an aggregation method of pooling the distribution matrix and the hierarchical graph to obtain the feature vectors of the super nodes, and then splicing the feature vectors of all the super nodes to obtain the feature matrix of the coarsened graph
Step five, performing iteration on each view i for m times to finally obtain an adjacent matrix of the coarsened graphAnd feature matrixFurther, the graph Laplace matrix is obtained according to the definitionAnd an allocation matrixWhereini=1,…,N。
Preferably, the multi-view spectral cluster fusion method comprises the following steps:
calculating a fused distribution matrix: adding the distribution matrixes of all the views i to obtainThen, for each row of S, setting the value at the maximum value of the row to be 1, and setting the others to be 0, obtaining a multi-view fused distribution matrix, and enabling the matrix S [ i, j to be]And 1 means that the ith node is of the jth class, and the clustering condition in the data to be processed can be directly obtained according to the matrix.
Preferably, for the classification task, the following steps are continued:
computing a weighted sum of the distance of each view i and the fused viewWherein the weight α of view iiFor each view importance measure, assigning a value according to the meaning of the selected view, this value being one of the objective functions;
then according to the multi-view spectral clustering method, minimizing the sum of the spectral clustering target functions of each view as another item in the target functions, adding the two items, and sorting to obtain the total target function,
wherein, Laplace matrix LmodA Laplace matrix which is a fused view i;
solving an objective function: according to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodThe first k eigenvectors are spliced to obtain a feature matrix U of the fused view i, wherein each row is pairedThe feature vector of a class, i.e. the feature representation of the data of each class in the data, is input into a classification algorithm to find the specific class corresponding to the feature vector of each class.
Preferably, the data to be processed is social network data,.
Provided is a multi-view clustering system based on hierarchical graph pooling, comprising:
the graph construction module is used for dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacency matrix of a representation graph structure, a feature matrix of feature information of the representation graph and a graph Laplace matrix;
the cluster information calculation and extraction module is used for extracting the cluster information of each view by adopting a hierarchical graph pooling hierarchical computation method, wherein the cluster information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an iterated adjacency matrix, a feature matrix and a graph Laplace matrix;
and the multi-view fusion module is used for fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to the feature vector of each category.
The invention at least comprises the following beneficial effects:
firstly, a multi-view learning method is adopted, learning is carried out on each view, and finally a feature matrix is fused, so that the multi-view characteristic of data is utilized.
Secondly, before view fusion, clustering is carried out on each view by adopting a hierarchical graph pooling method, a node is extracted from each cluster as a representative, the node is used as a node in the coarsened graph, and the feature vectors of the node are aggregated to the feature vectors of all the nodes in the original cluster, namely the feature vectors of all the nodes in the original cluster are included. And then fusing the clustered views, wherein the obtained views can also comprehensively contain the clustering information of the original views.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
FIG. 1 is a main flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic diagram of a model structure of one of the embodiments of the present invention;
fig. 3 is a detailed flowchart of embodiment 2 of the present invention.
Detailed Description
The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.
It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the reagents and materials, if not otherwise specified, are commercially available; in the description of the present invention, the terms indicating orientation or positional relationship are based on the orientation or positional relationship shown in the drawings only for the convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
As shown in fig. 1 to 3, the present invention provides a multi-view clustering method based on hierarchical graph pooling, comprising the following steps:
dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacent matrix of a representation graph structure, a feature matrix of representation graph feature information and a graph Laplacian matrix;
extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration;
and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors.
In the technical scheme, a corresponding graph representation is constructed for data of each view, namely views of each view, clustering information of each view is extracted through iteration of a layered graph pooling layer, a graph adjacency matrix and a feature matrix of each view are updated, and then data to be processed, such as social network data, are fused into comprehensive graph data by multi-view spectral clustering to obtain a feature matrix of a cluster center after fusion.
Taking social network data as an example, the construction of each view:
let the social network data have N data, N views. And constructing a data structure of each view i, wherein the data structure comprises an adjacency matrix of a characteristic diagram structure, a characteristic matrix of characteristic diagram characteristic information and a graph Laplace matrix. The data point is considered here as a node of view i, which is characterized by a feature vector of the data point. Then, edges are constructed according to the association between the data points, and the construction method comprises the method construction according to the inherent association construction and the K neighbor between the data points.
And (4) for each view i, iteratively using a hierarchical graph pooling method to obtain clustering information of the view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacency matrix, and then taking the graph data as the input of the next iteration. The iteration number m and the class number k in each iteration are preset hyper-parameters.
Taking social network data as an example, respectively adopting a layered graph pooling method to perform iterative computation on N views of the social network data, and outputting a most coarsened graph and a distribution matrix after iteration is completed, wherein the distribution matrix of the N views forms cluster information of each view of the multi-view social network data;
and performing multi-view social network data fusion by adopting a multi-view learning algorithm, such as a multi-view spectral clustering algorithm. The following describes the multi-view spectral clustering algorithm as an example. The assumption of the multi-view spectral clustering method is that: each view i and the feature matrix of the fused view i form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views i can be calculated according to the distance formula of different points on the grassmann manifold.
Adding the distribution matrixes of all the views to obtainThen, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, the clustering condition in the social network data can be directly obtained. According to the matrix, the clustering condition in the social network data can be obtained, namely the user community structure and the user classification in the social network are represented.
The invention provides a clustering and classifying method based on hierarchical graph pooling for multi-view social network data by starting from the problem of limitation of over dependence on specific data characteristics and robustness of a model and an application scene. The hierarchical graph pooling method is a kind of method for storing the clustering structure information of graphs in graph representation learning. The method aims to extract a coarsened graph, wherein the characteristic vector of each node can store the information of the characteristic vector of one type of node in an input graph, the step is iterated continuously to obtain clustering data in graph data, an adjacency matrix and a characteristic matrix of the graph are input in each iteration, and the clustered adjacency matrix and characteristic matrix are output. The method is introduced into the clustering task of the multi-view social network data, so that the main defects of the mainstream method in the aspect of the clustering task are overcome, and the clustering effect of the model is improved.
Firstly, a multi-view learning method is adopted, learning is carried out on each view, and finally a feature matrix is fused, so that the multi-view characteristic of data is utilized.
Secondly, before view fusion, clustering is carried out on each view by adopting a hierarchical graph pooling method, a node is extracted from each cluster as a representative, the node is used as a node in the coarsened graph, and the feature vectors of the node are aggregated to the feature vectors of all the nodes in the original cluster, namely the feature vectors of all the nodes in the original cluster are included. And then fusing the clustered views, wherein the obtained views can also comprehensively contain the clustering information of the original views.
In another technical scheme, the method for constructing the view comprises the following steps: each data point is taken as a node of the view i, the data points are characterized by the eigenvectors of the data points, the edge of the view i is constructed according to the association between the data points, and the graph adjacency matrix A is obtained(i)。
The K-nearest neighbor algorithm may also be used to construct the graph,
dividing the feature vector of the data points, constructing a view for the feature of each divided part according to the K-nearest neighbor composition method to obtain a graph adjacency matrix A ^ ((i)) of the view, wherein,
the characteristic vector of the data points in the K neighbor construction method is regarded as the coordinate in the n-dimensional space, the distance value of each data point and other nodes is calculated according to the distance calculation method in the n-dimensional Euclidean space, then each node is traversed, the distances between the other nodes and the nodes are sorted according to the size, for the given neighbor number K, if the other nodes are within the first K neighbor numbers of the node, an edge is formed between the other nodes and the neighbor number K, and the neighbor number K is a hyper-parameter.
Method of constructing a feature matrix from the inherent association between data points: selecting the value of the original data structure characteristic of the data point as the characteristic vector of the node, and then splicing the values according to the rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)Forming an initial data set;
the method for constructing the feature matrix according to the K nearest neighbor algorithm comprises the following steps: splicing the eigenvectors of the data points in each view i according to rows to obtain a corresponding characteristic matrix X(i)And calculating the degree matrix and the graph Laplace matrix L according to the definition of the degree matrix and the graph Laplace(i)。
In the above technical solution, the graph adjacency matrix a of the view i can be constructed according to the inherent association relation between the data(i). The data features can be selectively divided, and a view is constructed for the divided features of each part according to a K-nearest neighbor composition method; or directly using features as view construction. The feature vector of the data in the construction method of the K neighbor is regarded as a coordinate in an n-dimensional space, and the distance value of each data point and other points is calculated according to a distance calculation method in an n-dimensional Euclidean space. Then traversing each node, sequencing the distances between other nodes and the node, and giving a given neighbor number k, if other nodes are in the first k neighbors of the node, an edge is formed between the other nodes and the neighbor number k, wherein the neighbor number k is a hyper-parameter. Finally, the graph adjacency matrix A of the view i can be obtained(i)。
The construction method of K neighbors is characterized in that the eigenvectors (column vectors) of data points in each view i are spliced according to rows to obtain an eigenvector matrix X(i)And calculating the degree matrix and the graph Laplace matrix L according to the definition(i). For the method constructed according to the inherent link between data, the value of the network structure feature in the data can be selected as the feature (column vector) of the node, such as degree and centrality; then, splicing the features according to rows to obtain a feature matrix X(i). Therefore, matrix representation containing the associated information and the characteristic information of the description nodes can be obtained, more comprehensive information is obtained, and the subsequent clustering task is facilitated. An initial data set is formed.
In another technical scheme, calculating and extracting clustering information of each view i: and iterating by adopting a hierarchical graph pooling method to obtain clustering information of the hierarchical graph, outputting graph data of the clustered coarsened graph after each iteration, wherein the graph data comprises a characteristic matrix and an adjacent matrix and is used as input of the next iteration, and the iteration number m and the category number k in each iteration are preset hyper-parameters.
The step of calculating and extracting the clustering information of each view i specifically comprises the following steps:
step one, setting an algorithm for extracting a clustering center, applying the algorithm to graph data of an input graph, and taking the obtained clustering center as a node of an output graph to obtain a node corresponding relation between the input graph and the output graph: by the set algorithm and the characteristic matrix and the adjacency matrix of the input graph, the nodes of the input graph are divided into different clusters, the characteristic vectors of the same class of nodes in the input graph are comprehensively represented by the characteristic vector of one node in the coarsened graph, and the node is a super node and is regarded as a clustering center;
step two, using distribution matrixAnd (3) representing the node corresponding relation between the input graph and the output graph: if the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is inWhere the value of (1) is, if not in a certain class, then the matrix is assignedThe middle corresponding position is 0;
step three, obtaining the adjacency matrix of the coarsened graph through the adjacency matrix and the distribution matrix of the input graphObtaining the structure of the coarsening graph;
aggregating the feature vectors of the same type of nodes in the input graph by an aggregation method of pooling the distribution matrix and the hierarchical graph to obtain the feature vectors of the super nodes, and then splicing the feature vectors of all the super nodes to obtain the feature matrix of the coarsened graph
Step five, performing iteration on each view i for m times to finally obtain an adjacent matrix of the coarsened graphAnd feature matrixFurther, the graph Laplace matrix is obtained according to the definitionAnd an allocation matrixWhere i is 1, …, N.
In the technical scheme, a hierarchical graph pooling method is iteratively used for each view i in the step to obtain clustering information of the view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacent matrix, as the input of the next iteration. The iteration number m and the class number k in each iteration are preset hyper-parameters. For each view i, the iteration steps for the ith iteration are as follows:
the method comprises the following steps: and setting an algorithm for extracting the clustering centers, inputting the action domains of the algorithms into graph data of the graph, and taking the obtained clustering centers as nodes of the output graph to obtain the corresponding relation between the input graph and the output graph nodes. The input graph nodes can be divided into different clusters through a set algorithm and a feature matrix and an adjacency matrix of the input graph, for feature vectors of the same class of nodes in the input graph, the feature vectors of one node (super-node) can be comprehensively represented in the coarsened graph, and the node is regarded as a cluster center.
Step two: by means of distribution matricesAnd characterizing the corresponding relation between the input graph and the output graph nodes. If the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is inThe value of (b) is 1; if not in a class, then the matrix is assignedThe middle corresponding position is 0.
Step three: obtaining an adjacency matrix of a coarsened graph by inputting an adjacency matrix and an allocation matrix of the graphAnd obtaining the structure in the coarsened graph.
Step four: aggregating the eigenvectors of a class of nodes in the input graph by an aggregation method in pooling operation of the distribution matrix and the hierarchical graph to obtain eigenvectors of the super nodes, and splicing the eigenvectors of each super node to obtain an eigenvector of a coarsened graph
Step five: iterating and executing the first four steps on each view, wherein the iteration execution times m are hyper-parameters, and finally obtaining an adjacent matrix of the coarsened graphAnd feature matrixFurther, the graph Laplace matrix is obtained according to the definitionAnd an allocation matrixWhere i is 1, …, N. Through continuous iteration, the model can extract clustering information in the data and the optimal number of the clustered categories, wherein each category is used as a node of the output coarsened graph.
In another technical scheme, the multi-view spectral clustering fusion method comprises the following steps:
calculating a fused distribution matrix: adding the distribution matrixes of all the views i to obtainThen, for each row of S, setting the value at the maximum value of the row to be 1, and setting the others to be 0, obtaining a multi-view fused distribution matrix, and enabling the matrix S [ i, j to be]And 1 means that the ith node is of the jth class, and the clustering condition in the data to be processed can be directly obtained according to the matrix.
For the classification task, the following steps are continued:
calculate eachWeighted sum of distances of view i and fused viewWherein the weight α of view iiFor each view importance measure, assigning a value according to the meaning of the selected view, this value being one of the objective functions;
then according to the multi-view spectral clustering method, minimizing the sum of the spectral clustering target functions of each view as another item in the target functions, adding the two items, and sorting to obtain the total target function,
wherein, Laplace matrix LmodA Laplace matrix which is a fused view i;
solving an objective function: according to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodAnd splicing the first k eigenvectors to obtain an eigenvector matrix U of the fused view i, wherein each row corresponds to an eigenvector of each category, namely the characteristic representation of the data of each category in the data, and then inputting the eigenvector of each category into a classification algorithm to obtain the specific category corresponding to the eigenvector of each category.
In the technical scheme, the multi-view social network data fusion is performed by adopting a multi-view learning algorithm, such as a multi-view spectral clustering algorithm. The following describes the multi-view spectral clustering algorithm as an example. The assumption of the multi-view spectral clustering method is that: each view and the feature matrix of the fused view form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views can be calculated according to the distance formula of different points on the grassmann manifold. The fusion method comprises the following concrete implementation steps:
the method comprises the following steps: and calculating the fused distribution matrix. Adding the distribution matrixes of all the views to obtainThen, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, the clustering condition in the social network data can be directly obtained. According to the matrix, the clustering condition in the social network data can be obtained, namely the user community structure and the user classification in the social network are represented. For the classification task, the following steps may be continued:
step two: computing a weighted sum of the distance of each perspective and the fused viewWherein the weight α of view iiFor each view importance measure, a value is assigned according to the meaning of the selected view. This value acts as a term in the objective function.
Step three: and according to a multi-view spectral clustering method, minimizing the sum of spectral clustering target functions of each view as another item in the target functions. The two are added and then are sorted to obtain a total objective function, namely the problem of minimizing the traces of the matrix:
wherein, Laplace matrix LmodIs the laplacian matrix of the fused back view.
Step four: and solving the objective function. According to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodSplicing the first k eigenvectors to obtain the fusedAnd (4) a feature matrix U of the view, so that feature representation of data of each category in the data can be obtained, wherein a feature vector of one category corresponds to each row of U. Then, the feature vectors are further input into a classification algorithm to obtain specific classes corresponding to the feature vectors of each class.
Provided is a multi-view clustering system based on hierarchical graph pooling, comprising:
the graph constructing module is used for constructing a corresponding graph representation of the data to be processed according to each visual angle to obtain a corresponding view, and the view comprises an adjacent matrix of a representation graph structure, a feature matrix of the feature information of the representation graph and a graph Laplace matrix;
the cluster information calculation and extraction module is used for extracting the cluster information of each view by adopting a hierarchical graph pooling hierarchical computation method, wherein the cluster information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an iterated adjacency matrix, a feature matrix and a graph Laplace matrix;
and the multi-view fusion module is used for fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to the feature vector of each category.
In the technical scheme, the graph building module builds corresponding graph representation for data of each view, namely views of each view, extracts clustering information of each view through iteration of a layered graph pooling layer, updates a graph adjacency matrix and a feature matrix of each view, and then fuses data to be processed, such as 'social network data', into comprehensive graph data by using multi-view spectral clustering to obtain a feature matrix of a fused cluster center.
And the clustering information calculation and extraction module performs iterative calculation on each view i by using a layered graph pooling method to obtain clustering information of each view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacency matrix, and then taking the graph data as the input of the next iteration. The iteration number m and the class number k in each iteration are preset hyper-parameters.
The multi-view fusion module adopts a multi-view learning algorithm, such as a multi-view spectral clustering algorithm, to perform multi-view social network data fusion. The following describes the multi-view spectral clustering algorithm as an example. The assumption of the multi-view spectral clustering method is that: each view i and the feature matrix of the fused view i form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views i can be calculated according to the distance formula of different points on the grassmann manifold.
Adding the distribution matrixes of all the views to obtainThen, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, the clustering condition in the social network data can be directly obtained. According to the matrix, the clustering condition in the social network data can be obtained, namely the user community structure and the user classification in the social network are represented.
The invention provides a clustering and classifying system based on hierarchical graph pooling for multi-view social network data by starting from the problem of over dependence on specific data characteristics, robustness of a model and limitation of an application scene. The hierarchical graph pooling method is a kind of method for storing the clustering structure information of graphs in graph representation learning. The method aims to extract a coarsened graph, wherein the characteristic vector of each node can store the information of the characteristic vector of one type of node in an input graph, the step is iterated continuously to obtain clustering data in graph data, an adjacency matrix and a characteristic matrix of the graph are input in each iteration, and the clustered adjacency matrix and characteristic matrix are output. The method is introduced into the clustering task of the multi-view social network data, so that the main defects of the mainstream method in the aspect of the clustering task are overcome, and the clustering effect of the model is improved.
Firstly, a multi-view learning method is adopted, learning is carried out on each view, and finally a feature matrix is fused, so that the multi-view characteristic of data is utilized.
Secondly, before view fusion, clustering is carried out on each view by adopting a hierarchical graph pooling method, a node is extracted from each cluster as a representative, the node is used as a node in the coarsened graph, and the feature vectors of the node are aggregated to the feature vectors of all the nodes in the original cluster, namely the feature vectors of all the nodes in the original cluster are included. And then fusing the clustered views, wherein the obtained views can also comprehensively contain the clustering information of the original views.
< example 1>
The invention provides a multi-view social network data classification method based on a hierarchical graph pooling method, which comprises the following steps:
constructing the data of each visual angle into image data by methods such as K neighbor and the like;
clustering data of each visual angle based on a hierarchical graph pooling method Eigenpooling in graph representation learning to obtain clustered graph data of each visual angle;
fusing multi-view clustering data based on a multi-view spectral clustering algorithm to obtain data clustering conditions, and then executing a classification algorithm to obtain a class label of each data;
(1) construction of multi-view social network data
Let N data, N views, in the social network dataset, the data including user personal characteristic data and social data. Social data contains various associations and behaviors between users. Nodes are used to represent users in the data, and edges represent connections between users. And then constructing views respectively according to the inherent link construction between the data and the K-nearest neighbor method. The method comprises the following concrete steps:
the method comprises the following steps: constructing corresponding social networks by taking each connection among data in the data set, such as praise, comment and friend relationship, as an edge respectively to obtain a graph adjacency matrix A(i)。
Step two: and dividing the characteristics of the user, and constructing a view for the characteristics of each divided part according to a K-nearest neighbor composition method. And calculating cosine distance values of each data point and other points. The data points are considered herein as nodes of the graph, using the data pointsThe feature vector characterizes the node, and the distance is calculated by regarding the feature vector of the data as a coordinate in an n-dimensional space. Traversing each node, sequencing the distances between other nodes and the nodes, and giving a given neighbor number k, if the other nodes are in the first k neighbors of a certain node, an edge is formed between the other nodes and the neighbor number k, wherein the neighbor number k is a hyper-parameter. Graph adjacency matrix A of data of the graph finally obtained(i). The model structure is shown in fig. 2.
Step three: for the user characteristic views, splicing the characteristic vectors of the data in each view i to obtain a characteristic matrix X(i)And calculating a degree matrix
D(i)=diag(d11,…,dnn),
And regularized graph Laplace matrix
WhereinIs A(i)Row i and column j. For the user relationship view, values of network structure features in the social network can be selected as features (column vectors) of the nodes, such as degree and centrality; then, splicing the features according to rows to obtain a feature matrix X(i)And calculating a degree matrix D(i)And the Laplace matrix L of the graph(i). An initial set of social network data is formed.
(2) Clustering information extraction for multi-view social network data
In this step, for each view i, a hierarchical graph pooling method Eigenpooling is iteratively used to find the clustering information of the view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacent matrix, as the input of the next iteration. The iteration number and the category number k in each iteration are preset hyper-parameters. For each view i, the iteration steps for the ith iteration are as follows:
the method comprises the following steps: for each view, a spectral clustering algorithm is performed. For each view i, find L(i)The first k eigenvalues are used as column vectors to be spliced into an eigenvalue matrix U', and then the row vector y of the eigenvalue matrix is subjected to1,…,ynAnd clustering to obtain a series of non-overlapped clusters. If i is in class j, then the data i is in class j. And setting a node (supernode) in the coarsening graph corresponding to each class cluster.
Step two: this correspondence is represented by the sampling operator C and the allocation matrix as S. Let L-shape(k)Representing the list of nodes in the kth class cluster.Sample operator representing the kth class cluster, C(k)1 if and only if Γ(k)(j)=viThen C is(k)And the corresponding relation between each node in the class cluster and the super node is shown. WhereinThe number of nodes in the kth class cluster.
Step three: and solving the structure of the coarsened graph through a sampling operator.
Is provided withRefers to the input graph adjacency matrix for the ith view in the ith iteration,the adjacency matrix within the kth class cluster in the graph is shown as input in the l iteration. Is obtained by the following formula
Then, an intra-class cluster adjacency matrix is obtained for all class clustersAdjacency matrix between class clusters
Setting the distribution matrix of the nodes of the input graph and the output coarsened graph of the ith view in the ith iteration asWhereinIf and only if vi∈Γ(j). The assignment matrix records the mapping of the nodes of the input graph to the class clusters of the output graph in each iteration. The adjacency matrix of the outputted coarsened map can then be found: including the structural information in the original network.
Step four: and (4) solving a feature matrix of the coarsened graph through graph Fourier transform. Is provided with L(k)A graph laplacian matrix representing the kth class cluster,for its corresponding feature vector, the feature matrix of the input graph isObtaining a feature matrix of each cluster class by upsamplingThe feature vectors are then upsampled:
Then using the r-th pooling operator thetarPooling the r-th feature matrix, namely performing Fourier transform on the feature matrix of each sub-graph:and then splicing results of different pooling operators to obtain a feature matrix of the coarsened graph after the hierarchical graph pooling method EigenPooling:
step five: iterating and executing the first four steps on each view, wherein the iteration execution times m are hyper-parameters, and outputting the feature matrix of the clustered coarsened graphAnd adjacency matrixThen, the degree matrix of each view is obtained
And the Laplace matrix of the graph
This represents the characteristic information and structural information after the respective clustering of the respective perspective data, and the allocation matrix
This records the mapping of the data of each view to the categories. Through continuous iteration, the model can extract clustering information in the data and the number of categories of the optimal clusters, wherein each category is used as a node of the output coarsened graph. The node integrates the characteristic information of all nodes in the class cluster, and also keeps the cluster structure in the input graph.
(3) Fusion of multi-view social network data clustering information
The invention adopts a multi-view spectral clustering method to perform multi-view information fusion. The assumption of the multi-view spectral clustering method is that: each view and the feature matrix of the fused view form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views can be calculated according to the distance formula of different points on the grassmann manifold. The fusion method comprises the following concrete implementation steps:
the method comprises the following steps: and calculating the fused distribution matrix. Adding the distribution matrixes of all the views to obtainThen, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, canAnd obtaining the clustering condition in the social network data.
Step two: computing a weighted sum of the distance of each perspective and the fused viewWherein the weight α of view iiFor each view importance measure, a value is assigned according to the meaning of the selected view. This value acts as a term in the objective function.
Step three: and according to a multi-view spectral clustering method, minimizing the sum of spectral clustering target functions of each view as another item in the target functions. The two are added and then are sorted to obtain a total objective function, namely the problem of minimizing the traces of the matrix:
wherein the Laplace matrix LmodIs the laplacian matrix of the fused back view.
Step four: and solving the objective function. According to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodAnd splicing the first k eigenvectors to obtain a characteristic matrix U of the fused view, wherein each row corresponds to one category eigenvector, namely the characteristic representation of each category of data in the data. Then, the feature vectors are further input into a classification algorithm to obtain specific classes corresponding to the feature vectors of each class.
< example 2>
The invention provides a multi-view social network data clustering method based on a hierarchical graph pooling technology, which comprises the following steps:
constructing the data of each visual angle into image data by methods such as K neighbor and the like;
clustering data of each visual angle based on a hierarchical graph pooling method Eigenpooling in graph representation learning to obtain clustered graph data of each visual angle;
fusing multi-view clustering data based on a multi-view spectral clustering algorithm to obtain a comprehensive clustering data characteristic matrix and a distribution matrix;
(1) construction of multi-view social network data
Let N data, N views, in the social network dataset, the data including user personal characteristic data and social data. Social data contains various associations and behaviors between users. Nodes are used to represent users in the data, and edges represent connections between users. And then constructing the views respectively according to the inherent link construction between the data and the method of K neighbor. The method comprises the following concrete steps:
the method comprises the following steps: constructing corresponding social networks by taking each connection among data in the data set, such as praise, comment and friend relationship, as an edge respectively to obtain a graph adjacency matrix A(i)。
Step two: and dividing the characteristics of the user, and constructing a view for the characteristics of each divided part according to a K-nearest neighbor composition method. And calculating cosine distance values of each data point and other points. The data points are regarded as nodes of the graph, the points are characterized by the characteristic vectors of the data points, and the characteristic vectors of the data are regarded as coordinates in an n-dimensional space, so that the distance is calculated. Traversing each node, sequencing the distances between other nodes and the nodes, and giving a given neighbor number k, if other nodes are in the first k neighbors of a certain node, an edge is formed between the other nodes and the neighbor number k, wherein the neighbor number k is a hyper-parameter. Graph adjacency matrix A of data of the graph finally obtained(i). The model structure is shown in fig. 2.
Step three: for the user characteristic views, splicing the characteristic vectors of the data in each view i to obtain a characteristic matrix X(i)And calculating a degree matrix
D(i)=diag(d11,…,dnn),
And regularized graph Laplace matrix
WhereinIs A(i)Row i and column j. For a user relationship view, values of network structure features in a social network can be selected as features (column vectors) of nodes, such as degree and centrality; then, splicing the features according to rows to obtain a feature matrix X(i)And calculating a degree matrix D(i)And the Laplace matrix L of the graph(i). An initial set of social network data is formed.
(2) Clustering information extraction for multi-view social network data
In this step, for each view, a hierarchical graph pooling technique Eigenpooling is iteratively used to find the clustering information of the view. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacent matrix, as the input of the next iteration. The iteration number and the category number k in each iteration are preset hyper-parameters. For each view, the iteration steps for the ith iteration are as follows:
the method comprises the following steps: for each view, a spectral clustering algorithm is performed. For each view i, find L(i)The first k eigenvalues are used as column vectors to be spliced into an eigenvalue matrix U', and then the row vector y of the eigenvalue matrix is subjected to1,…,ynAnd clustering to obtain a series of non-overlapped clusters. If i is in class j, then the data i is in class j. And setting a node (supernode) in the coarsening graph corresponding to each class cluster.
Step two: this correspondence is represented by an allocation matrix S. Setting the distribution matrix of the nodes of the input graph and the output coarsened graph of the ith view in the ith iteration asWhereinIf and only if the ith data is assigned in the jth class. The assignment matrix records the mapping of the nodes of the input graph to the class clusters of the output graph in each iteration.
(3) Fusion of multi-view social network data clustering information
In the step, the fused distribution matrix is calculated, and according to the matrix, the clustering condition in the social network data, namely the user community condition in the social network, can be obtained. The concrete implementation steps are as follows:
the method comprises the following steps: adding the distribution matrixes of all the views to obtain
Step two: for each row of S, setting the value at the maximum value of the row as 1, and setting the other values as 0, and obtaining the distribution matrix after multi-view fusion. The matrix S [ i, j ] ═ 1 indicates that the ith node is of the jth class.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.
Claims (10)
1. The multi-view clustering method based on hierarchical graph pooling is characterized by comprising the following steps of:
dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacent matrix of a representation graph structure, a feature matrix of representation graph feature information and a graph Laplacian matrix;
extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration;
and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors.
2. The multi-view clustering method based on hierarchical graph pooling of claim 1 wherein the view is constructed by: each data point is taken as a node of the graph, the data points are characterized by the eigenvectors of the data points, the edges of the graph are constructed according to the association between the data points, and a graph adjacency matrix A is obtained(i)。
3. The method for multi-view clustering based on hierarchical graph pooling of claim 2 wherein the method of constructing edges of a graph further comprises a K-nearest neighbor algorithm comprising the steps of:
dividing the feature vector of the data points, constructing a view for the feature of each divided part according to the K-nearest neighbor composition method to obtain a graph adjacency matrix A ^ ((i)) of the view, wherein,
the feature vector of the data point is regarded as a coordinate in an n-dimensional space, the distance value of each data point and other nodes is calculated according to a distance calculation method in an n-dimensional Euclidean space, then each node is traversed, the distances between other nodes are ranked according to the magnitude, for a given neighbor number k, if the other nodes are within the first k neighbor numbers of the node, an edge is formed between the other nodes and the neighbor number k, and the neighbor number k is a hyper-parameter.
4. The multi-view clustering method based on hierarchical graph pooling of claim 3,
method of constructing a feature matrix from the inherent association between data points: selecting the value of the original data structure characteristic of the data point as the characteristic vector of the node, and then splicing the values according to the rows to obtain a corresponding characteristic matrix X(i)And according to the degree matrix and the graph lapraDefinition of the views A degree matrix D is calculated(i)And the Laplace matrix L of the graph(i)Forming an initial data set;
the method for constructing the feature matrix according to the K nearest neighbor algorithm comprises the following steps: splicing the eigenvectors of the data points in each view i according to rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)。
5. The multi-view clustering method based on hierarchical graph pooling of claim 1 wherein the clustering information for each view i is computed: and iterating by adopting a hierarchical graph pooling method to obtain clustering information of the hierarchical graph, outputting graph data of the clustered coarsened graph after each iteration, wherein the graph data comprises a characteristic matrix and an adjacent matrix and is used as input of the next iteration, and the iteration number m and the category number k in each iteration are preset hyper-parameters.
6. The multi-view clustering method based on hierarchical graph pooling of claim 5, wherein the calculating and extracting clustering information of each view i specifically comprises the following steps:
step one, setting an algorithm for extracting a clustering center, applying the algorithm to graph data of an input graph, and taking the obtained clustering center as a node of an output graph to obtain a node corresponding relation between the input graph and the output graph: by the set algorithm and the characteristic matrix and the adjacency matrix of the input graph, the nodes of the input graph are divided into different clusters, the characteristic vectors of the same class of nodes in the input graph are comprehensively represented by the characteristic vector of one node in the coarsened graph, and the node is a super node and is regarded as a clustering center;
step two, using distribution matrixAnd (3) representing the node corresponding relation between the input graph and the output graph: if the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is inWhere the value of (1) is, if not in a certain class, then the matrix is assignedThe middle corresponding position is 0;
step three, obtaining the adjacency matrix of the coarsened graph through the adjacency matrix and the distribution matrix of the input graphObtaining the structure of the coarsening graph;
aggregating the feature vectors of the same type of nodes in the input graph by an aggregation method of pooling the distribution matrix and the hierarchical graph to obtain the feature vectors of the super nodes, and then splicing the feature vectors of all the super nodes to obtain the feature matrix of the coarsened graph
7. The multi-view clustering method based on hierarchical pooling of maps according to claim 1 wherein the multi-view spectral clustering fusion method comprises the steps of:
calculating a fused distribution matrix: adding the distribution matrixes of all the views i to obtainThen, for each row of S, setting the value at the maximum value of the row to be 1, and setting the others to be 0, obtaining a multi-view fused distribution matrix, and enabling the matrix S [ i, j to be]And 1 means that the ith node is of the jth class, and the clustering condition in the data to be processed can be directly obtained according to the matrix.
8. The multi-view clustering method based on hierarchical graph pooling of claim 7, wherein for the classification task, the following steps are continued:
computing a weighted sum of the distance of each view i and the fused viewWherein the weight α of view iiFor each view importance measure, assigning a value according to the meaning of the selected view, this value being one of the objective functions;
then according to the multi-view spectral clustering method, minimizing the sum of the spectral clustering target functions of each view as another item in the target functions, adding the two items, and sorting to obtain the total target function,
wherein, Laplace matrix LmodA Laplace matrix which is a fused view i;
solving an objective function: according to Rayleigh-Ritz's theorem, on minimizationThe solution U of the objective function is equivalent to LmodAnd splicing the first k eigenvectors to obtain an eigenvector matrix U of the fused view i, wherein each row corresponds to an eigenvector of each category, namely the characteristic representation of the data of each category in the data, and then inputting the eigenvector of each category into a classification algorithm to obtain the specific category corresponding to the eigenvector of each category.
9. The multi-view clustering method based on hierarchical graph pooling according to any one of claims 1 to 8, wherein the data to be processed is social network data.
10. Multi-view clustering system based on hierarchical graph pooling, comprising:
the graph construction module is used for dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacency matrix of a representation graph structure, a feature matrix of feature information of the representation graph and a graph Laplace matrix;
the cluster information calculation and extraction module is used for extracting the cluster information of each view by adopting a hierarchical graph pooling hierarchical computation method, wherein the cluster information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an iterated adjacency matrix, a feature matrix and a graph Laplace matrix;
and the multi-view fusion module is used for fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to the feature vector of each category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110393842.8A CN113255720A (en) | 2021-04-13 | 2021-04-13 | Multi-view clustering method and system based on hierarchical graph pooling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110393842.8A CN113255720A (en) | 2021-04-13 | 2021-04-13 | Multi-view clustering method and system based on hierarchical graph pooling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113255720A true CN113255720A (en) | 2021-08-13 |
Family
ID=77220630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110393842.8A Pending CN113255720A (en) | 2021-04-13 | 2021-04-13 | Multi-view clustering method and system based on hierarchical graph pooling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255720A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688425A (en) * | 2023-12-07 | 2024-03-12 | 重庆大学 | Multi-task graph classification model construction method and system for Non-IID graph data |
-
2021
- 2021-04-13 CN CN202110393842.8A patent/CN113255720A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688425A (en) * | 2023-12-07 | 2024-03-12 | 重庆大学 | Multi-task graph classification model construction method and system for Non-IID graph data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | Why resnet works? residuals generalize | |
CN108132968B (en) | Weak supervision learning method for associated semantic elements in web texts and images | |
Khrissi et al. | Clustering method and sine cosine algorithm for image segmentation | |
WO2019015246A1 (en) | Image feature acquisition | |
CN105809672B (en) | A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring | |
CN108875076B (en) | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network | |
Qiu et al. | Deep learning-based algorithm for vehicle detection in intelligent transportation systems | |
CN113255895B (en) | Structure diagram alignment method and multi-diagram joint data mining method based on diagram neural network representation learning | |
US8429163B1 (en) | Content similarity pyramid | |
CN113822325A (en) | Method, device and equipment for supervised learning of image features and storage medium | |
CN115546525A (en) | Multi-view clustering method and device, electronic equipment and storage medium | |
CN111178196B (en) | Cell classification method, device and equipment | |
Rajendra Prasad et al. | An efficient sampling-based visualization technique for big data clustering with crisp partitions | |
CN108564116A (en) | A kind of ingredient intelligent analysis method of camera scene image | |
Pei et al. | Texture classification based on image (natural and horizontal) visibility graph constructing methods | |
CN113255720A (en) | Multi-view clustering method and system based on hierarchical graph pooling | |
CN113723558A (en) | Remote sensing image small sample ship detection method based on attention mechanism | |
Shi et al. | Deep message passing on sets | |
Chaudhary et al. | A review on various algorithms used in machine learning | |
CN113139556B (en) | Manifold multi-view image clustering method and system based on self-adaptive composition | |
CN113205184B (en) | Invariant learning method and device based on heterogeneous hybrid data | |
Babatunde et al. | Comparative analysis of genetic algorithm and particle swam optimization: An application in precision agriculture | |
CN114638953A (en) | Point cloud data segmentation method and device and computer readable storage medium | |
CN111461265B (en) | Scene image labeling method based on coarse-fine granularity multi-image multi-label learning | |
CN111428741B (en) | Network community discovery method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210813 |