CN113255720A - Multi-view clustering method and system based on hierarchical graph pooling - Google Patents

Multi-view clustering method and system based on hierarchical graph pooling Download PDF

Info

Publication number
CN113255720A
CN113255720A CN202110393842.8A CN202110393842A CN113255720A CN 113255720 A CN113255720 A CN 113255720A CN 202110393842 A CN202110393842 A CN 202110393842A CN 113255720 A CN113255720 A CN 113255720A
Authority
CN
China
Prior art keywords
graph
view
matrix
clustering
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110393842.8A
Other languages
Chinese (zh)
Inventor
李欣
赵志云
葛自发
孙小宁
张冰
万欣欣
袁钟怡
赵忠华
孙立远
付培国
王禄恒
王晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202110393842.8A priority Critical patent/CN113255720A/en
Publication of CN113255720A publication Critical patent/CN113255720A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-view clustering method based on hierarchical graph pooling, which comprises the following steps of: dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views; extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration; and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors. The method has the advantages that the multi-view characteristics of the data to be processed are fully utilized, and the clustering information of all the original views can be comprehensively contained. Disclosed is a multi-view clustering system based on hierarchical graph pooling, comprising: the system comprises a graph building module, a clustering information calculation and extraction module and a multi-view fusion module. The method has the beneficial effect of improving the clustering effect.

Description

Multi-view clustering method and system based on hierarchical graph pooling
Technical Field
The present invention relates to the field of clustering and classification. More specifically, the present invention relates to a multi-view clustering method and system based on hierarchical graph pooling.
Background
As data gathering technologies develop, more and more social network data sets have the characteristic of multiple perspectives (also called multi-modal). Multi-view learning refers to data collected from different domains or obtained from different feature extractors and exhibiting heterogeneous characteristics. In social network analysis, a node represents an object and an edge represents a connection. In the multi-view social network data set, there are various association relationships between nodes, wherein a network constructed by 1 association is 1 view. The multi-view learning improves the generalization performance of the model by modeling the learning mode of each view and then carrying out view fusion.
At present, the following mainstream processing methods exist for clustering and classifying multi-view social network data:
(1) and splicing the multi-view social network data into single-view data, and then clustering and classifying. The method splices the features in all the views together to form single-view data containing all the features of the original views. Then clustering the images by using a clustering algorithm such as k-means and the like.
(2) And clustering and classifying after the multi-view data are fused. The method utilizes a multi-view fusion method such as multi-view spectral clustering or multi-view metric learning to obtain a fused feature matrix, namely the representation of comprehensive multi-view data, wherein each data corresponds to one feature vector, and the number of samples is kept unchanged in the fusion process. And clustering the fused feature matrix by using mainstream clustering methods such as k-means to obtain feature vectors of the clustering centers of all categories.
In the face of the clustering and classifying problems of the multi-view social network data, the mainstream method has a good improvement effect on a certain specific problem. However, certain problems exist, the data characteristics are relatively dependent, and the robustness and the application scene of the model are relatively limited.
First, the method of stitching multi-view social network data into a single view does not utilize the multi-view characteristics of the data itself, and overfitting is easily caused.
Secondly, the method for clustering after multi-view fusion does not make special improvement on clustering and classification tasks, namely, structural information and characteristic information in each view are not well targeted, clustering properties in data are not learned, and clustering is only carried out according to the characteristic information after view fusion. Because the structural information of each visual angle is different before clustering, the clustering condition is not necessarily the same. In the fusion process, the method can only extract clustering information from the fused views, and the clustering information in each original view is discarded, so that the clustering and classification task effects are influenced.
Disclosure of Invention
An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.
The invention also aims to provide a multi-view clustering method based on hierarchical graph pooling, which adopts a multi-view learning method to learn for each view and finally fuses into a feature matrix, and utilizes the multi-view characteristics of data, so that the fused views can comprehensively contain the clustering information of the original views.
The multi-view clustering system based on the layered graph pooling is simple in model structure, fast in calculation and outstanding in clustering effect.
To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided a hierarchical graph pooling-based multi-view clustering method, comprising the steps of:
dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacent matrix of a representation graph structure, a feature matrix of representation graph feature information and a graph Laplacian matrix;
extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration;
and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors.
Preferably, the method for constructing the view comprises the following steps: each data point is taken as a node of the graph, the data points are characterized by the eigenvectors of the data points, the edges of the graph are constructed according to the association between the data points, and a graph adjacency matrix A is obtained(i)
It is preferable that the first and second liquid crystal layers are formed of,
the method for constructing the edge of the graph further comprises a K nearest neighbor algorithm, and comprises the following steps:
dividing the feature vector of the data points, constructing a view for the feature of each divided part according to the K-nearest neighbor composition method to obtain a graph adjacency matrix A ^ ((i)) of the view, wherein,
the feature vector of the data point is regarded as a coordinate in an n-dimensional space, the distance value of each data point and other nodes is calculated according to a distance calculation method in an n-dimensional Euclidean space, then each node is traversed, the distances between other nodes are ranked according to the magnitude, for a given neighbor number k, if the other nodes are within the first k neighbor numbers of the node, an edge is formed between the other nodes and the neighbor number k, and the neighbor number k is a hyper-parameter.
It is preferable that the first and second liquid crystal layers are formed of,
method of constructing a feature matrix from the inherent association between data points: selecting the value of the original data structure characteristic of the data point as the characteristic vector of the node, and then splicing the values according to the rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)Forming an initial data set;
the method for constructing the feature matrix according to the K nearest neighbor algorithm comprises the following steps: splicing the eigenvectors of the data points in each view i according to rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)
Preferably, the clustering information for extracting each view i is calculated as: and iterating by adopting a hierarchical graph pooling method to obtain clustering information of the hierarchical graph, outputting graph data of the clustered coarsened graph after each iteration, wherein the graph data comprises a characteristic matrix and an adjacent matrix and is used as input of the next iteration, and the iteration number m and the category number k in each iteration are preset hyper-parameters.
Preferably, the calculating and extracting the clustering information of each view i specifically includes the following steps:
step one, setting an algorithm for extracting a clustering center, applying the algorithm to graph data of an input graph, and taking the obtained clustering center as a node of an output graph to obtain a node corresponding relation between the input graph and the output graph: by the set algorithm and the characteristic matrix and the adjacency matrix of the input graph, the nodes of the input graph are divided into different clusters, the characteristic vectors of the same class of nodes in the input graph are comprehensively represented by the characteristic vector of one node in the coarsened graph, and the node is a super node and is regarded as a clustering center;
step two, using distribution matrix
Figure BDA0003017780210000031
And (3) representing the node corresponding relation between the input graph and the output graph: if the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is in
Figure BDA0003017780210000032
Where the value of (1) is, if not in a certain class, then the matrix is assigned
Figure BDA0003017780210000033
The middle corresponding position is 0;
step three, obtaining the adjacency matrix of the coarsened graph through the adjacency matrix and the distribution matrix of the input graph
Figure BDA0003017780210000034
Obtaining the structure of the coarsening graph;
aggregating the feature vectors of the same type of nodes in the input graph by an aggregation method of pooling the distribution matrix and the hierarchical graph to obtain the feature vectors of the super nodes, and then splicing the feature vectors of all the super nodes to obtain the feature matrix of the coarsened graph
Figure BDA0003017780210000035
Step five, performing iteration on each view i for m times to finally obtain an adjacent matrix of the coarsened graph
Figure BDA0003017780210000036
And feature matrix
Figure BDA0003017780210000037
Further, the graph Laplace matrix is obtained according to the definition
Figure BDA0003017780210000038
And an allocation matrix
Figure BDA0003017780210000039
Whereini=1,…,N。
Preferably, the multi-view spectral cluster fusion method comprises the following steps:
calculating a fused distribution matrix: adding the distribution matrixes of all the views i to obtain
Figure BDA0003017780210000041
Then, for each row of S, setting the value at the maximum value of the row to be 1, and setting the others to be 0, obtaining a multi-view fused distribution matrix, and enabling the matrix S [ i, j to be]And 1 means that the ith node is of the jth class, and the clustering condition in the data to be processed can be directly obtained according to the matrix.
Preferably, for the classification task, the following steps are continued:
computing a weighted sum of the distance of each view i and the fused view
Figure BDA0003017780210000042
Wherein the weight α of view iiFor each view importance measure, assigning a value according to the meaning of the selected view, this value being one of the objective functions;
then according to the multi-view spectral clustering method, minimizing the sum of the spectral clustering target functions of each view as another item in the target functions, adding the two items, and sorting to obtain the total target function,
Figure BDA0003017780210000043
Figure BDA0003017780210000044
wherein, Laplace matrix LmodA Laplace matrix which is a fused view i;
solving an objective function: according to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodThe first k eigenvectors are spliced to obtain a feature matrix U of the fused view i, wherein each row is pairedThe feature vector of a class, i.e. the feature representation of the data of each class in the data, is input into a classification algorithm to find the specific class corresponding to the feature vector of each class.
Preferably, the data to be processed is social network data,.
Provided is a multi-view clustering system based on hierarchical graph pooling, comprising:
the graph construction module is used for dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacency matrix of a representation graph structure, a feature matrix of feature information of the representation graph and a graph Laplace matrix;
the cluster information calculation and extraction module is used for extracting the cluster information of each view by adopting a hierarchical graph pooling hierarchical computation method, wherein the cluster information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an iterated adjacency matrix, a feature matrix and a graph Laplace matrix;
and the multi-view fusion module is used for fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to the feature vector of each category.
The invention at least comprises the following beneficial effects:
firstly, a multi-view learning method is adopted, learning is carried out on each view, and finally a feature matrix is fused, so that the multi-view characteristic of data is utilized.
Secondly, before view fusion, clustering is carried out on each view by adopting a hierarchical graph pooling method, a node is extracted from each cluster as a representative, the node is used as a node in the coarsened graph, and the feature vectors of the node are aggregated to the feature vectors of all the nodes in the original cluster, namely the feature vectors of all the nodes in the original cluster are included. And then fusing the clustered views, wherein the obtained views can also comprehensively contain the clustering information of the original views.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
FIG. 1 is a main flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic diagram of a model structure of one of the embodiments of the present invention;
fig. 3 is a detailed flowchart of embodiment 2 of the present invention.
Detailed Description
The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.
It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the reagents and materials, if not otherwise specified, are commercially available; in the description of the present invention, the terms indicating orientation or positional relationship are based on the orientation or positional relationship shown in the drawings only for the convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
As shown in fig. 1 to 3, the present invention provides a multi-view clustering method based on hierarchical graph pooling, comprising the following steps:
dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacent matrix of a representation graph structure, a feature matrix of representation graph feature information and a graph Laplacian matrix;
extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration;
and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors.
In the technical scheme, a corresponding graph representation is constructed for data of each view, namely views of each view, clustering information of each view is extracted through iteration of a layered graph pooling layer, a graph adjacency matrix and a feature matrix of each view are updated, and then data to be processed, such as social network data, are fused into comprehensive graph data by multi-view spectral clustering to obtain a feature matrix of a cluster center after fusion.
Taking social network data as an example, the construction of each view:
let the social network data have N data, N views. And constructing a data structure of each view i, wherein the data structure comprises an adjacency matrix of a characteristic diagram structure, a characteristic matrix of characteristic diagram characteristic information and a graph Laplace matrix. The data point is considered here as a node of view i, which is characterized by a feature vector of the data point. Then, edges are constructed according to the association between the data points, and the construction method comprises the method construction according to the inherent association construction and the K neighbor between the data points.
And (4) for each view i, iteratively using a hierarchical graph pooling method to obtain clustering information of the view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacency matrix, and then taking the graph data as the input of the next iteration. The iteration number m and the class number k in each iteration are preset hyper-parameters.
Taking social network data as an example, respectively adopting a layered graph pooling method to perform iterative computation on N views of the social network data, and outputting a most coarsened graph and a distribution matrix after iteration is completed, wherein the distribution matrix of the N views forms cluster information of each view of the multi-view social network data;
and performing multi-view social network data fusion by adopting a multi-view learning algorithm, such as a multi-view spectral clustering algorithm. The following describes the multi-view spectral clustering algorithm as an example. The assumption of the multi-view spectral clustering method is that: each view i and the feature matrix of the fused view i form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views i can be calculated according to the distance formula of different points on the grassmann manifold.
Adding the distribution matrixes of all the views to obtain
Figure BDA0003017780210000061
Then, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, the clustering condition in the social network data can be directly obtained. According to the matrix, the clustering condition in the social network data can be obtained, namely the user community structure and the user classification in the social network are represented.
The invention provides a clustering and classifying method based on hierarchical graph pooling for multi-view social network data by starting from the problem of limitation of over dependence on specific data characteristics and robustness of a model and an application scene. The hierarchical graph pooling method is a kind of method for storing the clustering structure information of graphs in graph representation learning. The method aims to extract a coarsened graph, wherein the characteristic vector of each node can store the information of the characteristic vector of one type of node in an input graph, the step is iterated continuously to obtain clustering data in graph data, an adjacency matrix and a characteristic matrix of the graph are input in each iteration, and the clustered adjacency matrix and characteristic matrix are output. The method is introduced into the clustering task of the multi-view social network data, so that the main defects of the mainstream method in the aspect of the clustering task are overcome, and the clustering effect of the model is improved.
Firstly, a multi-view learning method is adopted, learning is carried out on each view, and finally a feature matrix is fused, so that the multi-view characteristic of data is utilized.
Secondly, before view fusion, clustering is carried out on each view by adopting a hierarchical graph pooling method, a node is extracted from each cluster as a representative, the node is used as a node in the coarsened graph, and the feature vectors of the node are aggregated to the feature vectors of all the nodes in the original cluster, namely the feature vectors of all the nodes in the original cluster are included. And then fusing the clustered views, wherein the obtained views can also comprehensively contain the clustering information of the original views.
In another technical scheme, the method for constructing the view comprises the following steps: each data point is taken as a node of the view i, the data points are characterized by the eigenvectors of the data points, the edge of the view i is constructed according to the association between the data points, and the graph adjacency matrix A is obtained(i)
The K-nearest neighbor algorithm may also be used to construct the graph,
dividing the feature vector of the data points, constructing a view for the feature of each divided part according to the K-nearest neighbor composition method to obtain a graph adjacency matrix A ^ ((i)) of the view, wherein,
the characteristic vector of the data points in the K neighbor construction method is regarded as the coordinate in the n-dimensional space, the distance value of each data point and other nodes is calculated according to the distance calculation method in the n-dimensional Euclidean space, then each node is traversed, the distances between the other nodes and the nodes are sorted according to the size, for the given neighbor number K, if the other nodes are within the first K neighbor numbers of the node, an edge is formed between the other nodes and the neighbor number K, and the neighbor number K is a hyper-parameter.
Method of constructing a feature matrix from the inherent association between data points: selecting the value of the original data structure characteristic of the data point as the characteristic vector of the node, and then splicing the values according to the rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)Forming an initial data set;
the method for constructing the feature matrix according to the K nearest neighbor algorithm comprises the following steps: splicing the eigenvectors of the data points in each view i according to rows to obtain a corresponding characteristic matrix X(i)And calculating the degree matrix and the graph Laplace matrix L according to the definition of the degree matrix and the graph Laplace(i)
In the above technical solution, the graph adjacency matrix a of the view i can be constructed according to the inherent association relation between the data(i). The data features can be selectively divided, and a view is constructed for the divided features of each part according to a K-nearest neighbor composition method; or directly using features as view construction. The feature vector of the data in the construction method of the K neighbor is regarded as a coordinate in an n-dimensional space, and the distance value of each data point and other points is calculated according to a distance calculation method in an n-dimensional Euclidean space. Then traversing each node, sequencing the distances between other nodes and the node, and giving a given neighbor number k, if other nodes are in the first k neighbors of the node, an edge is formed between the other nodes and the neighbor number k, wherein the neighbor number k is a hyper-parameter. Finally, the graph adjacency matrix A of the view i can be obtained(i)
The construction method of K neighbors is characterized in that the eigenvectors (column vectors) of data points in each view i are spliced according to rows to obtain an eigenvector matrix X(i)And calculating the degree matrix and the graph Laplace matrix L according to the definition(i). For the method constructed according to the inherent link between data, the value of the network structure feature in the data can be selected as the feature (column vector) of the node, such as degree and centrality; then, splicing the features according to rows to obtain a feature matrix X(i). Therefore, matrix representation containing the associated information and the characteristic information of the description nodes can be obtained, more comprehensive information is obtained, and the subsequent clustering task is facilitated. An initial data set is formed.
In another technical scheme, calculating and extracting clustering information of each view i: and iterating by adopting a hierarchical graph pooling method to obtain clustering information of the hierarchical graph, outputting graph data of the clustered coarsened graph after each iteration, wherein the graph data comprises a characteristic matrix and an adjacent matrix and is used as input of the next iteration, and the iteration number m and the category number k in each iteration are preset hyper-parameters.
The step of calculating and extracting the clustering information of each view i specifically comprises the following steps:
step one, setting an algorithm for extracting a clustering center, applying the algorithm to graph data of an input graph, and taking the obtained clustering center as a node of an output graph to obtain a node corresponding relation between the input graph and the output graph: by the set algorithm and the characteristic matrix and the adjacency matrix of the input graph, the nodes of the input graph are divided into different clusters, the characteristic vectors of the same class of nodes in the input graph are comprehensively represented by the characteristic vector of one node in the coarsened graph, and the node is a super node and is regarded as a clustering center;
step two, using distribution matrix
Figure BDA0003017780210000081
And (3) representing the node corresponding relation between the input graph and the output graph: if the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is in
Figure BDA0003017780210000082
Where the value of (1) is, if not in a certain class, then the matrix is assigned
Figure BDA0003017780210000083
The middle corresponding position is 0;
step three, obtaining the adjacency matrix of the coarsened graph through the adjacency matrix and the distribution matrix of the input graph
Figure BDA0003017780210000084
Obtaining the structure of the coarsening graph;
aggregating the feature vectors of the same type of nodes in the input graph by an aggregation method of pooling the distribution matrix and the hierarchical graph to obtain the feature vectors of the super nodes, and then splicing the feature vectors of all the super nodes to obtain the feature matrix of the coarsened graph
Figure BDA0003017780210000085
Step five, performing iteration on each view i for m times to finally obtain an adjacent matrix of the coarsened graph
Figure BDA0003017780210000091
And feature matrix
Figure BDA0003017780210000092
Further, the graph Laplace matrix is obtained according to the definition
Figure BDA0003017780210000093
And an allocation matrix
Figure BDA0003017780210000094
Where i is 1, …, N.
In the technical scheme, a hierarchical graph pooling method is iteratively used for each view i in the step to obtain clustering information of the view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacent matrix, as the input of the next iteration. The iteration number m and the class number k in each iteration are preset hyper-parameters. For each view i, the iteration steps for the ith iteration are as follows:
the method comprises the following steps: and setting an algorithm for extracting the clustering centers, inputting the action domains of the algorithms into graph data of the graph, and taking the obtained clustering centers as nodes of the output graph to obtain the corresponding relation between the input graph and the output graph nodes. The input graph nodes can be divided into different clusters through a set algorithm and a feature matrix and an adjacency matrix of the input graph, for feature vectors of the same class of nodes in the input graph, the feature vectors of one node (super-node) can be comprehensively represented in the coarsened graph, and the node is regarded as a cluster center.
Step two: by means of distribution matrices
Figure BDA0003017780210000095
And characterizing the corresponding relation between the input graph and the output graph nodes. If the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is in
Figure BDA0003017780210000096
The value of (b) is 1; if not in a class, then the matrix is assigned
Figure BDA0003017780210000097
The middle corresponding position is 0.
Step three: obtaining an adjacency matrix of a coarsened graph by inputting an adjacency matrix and an allocation matrix of the graph
Figure BDA0003017780210000098
And obtaining the structure in the coarsened graph.
Step four: aggregating the eigenvectors of a class of nodes in the input graph by an aggregation method in pooling operation of the distribution matrix and the hierarchical graph to obtain eigenvectors of the super nodes, and splicing the eigenvectors of each super node to obtain an eigenvector of a coarsened graph
Figure BDA0003017780210000099
Step five: iterating and executing the first four steps on each view, wherein the iteration execution times m are hyper-parameters, and finally obtaining an adjacent matrix of the coarsened graph
Figure BDA00030177802100000910
And feature matrix
Figure BDA00030177802100000911
Further, the graph Laplace matrix is obtained according to the definition
Figure BDA00030177802100000912
And an allocation matrix
Figure BDA00030177802100000913
Where i is 1, …, N. Through continuous iteration, the model can extract clustering information in the data and the optimal number of the clustered categories, wherein each category is used as a node of the output coarsened graph.
In another technical scheme, the multi-view spectral clustering fusion method comprises the following steps:
calculating a fused distribution matrix: adding the distribution matrixes of all the views i to obtain
Figure BDA00030177802100000914
Then, for each row of S, setting the value at the maximum value of the row to be 1, and setting the others to be 0, obtaining a multi-view fused distribution matrix, and enabling the matrix S [ i, j to be]And 1 means that the ith node is of the jth class, and the clustering condition in the data to be processed can be directly obtained according to the matrix.
For the classification task, the following steps are continued:
calculate eachWeighted sum of distances of view i and fused view
Figure BDA0003017780210000101
Wherein the weight α of view iiFor each view importance measure, assigning a value according to the meaning of the selected view, this value being one of the objective functions;
then according to the multi-view spectral clustering method, minimizing the sum of the spectral clustering target functions of each view as another item in the target functions, adding the two items, and sorting to obtain the total target function,
Figure BDA0003017780210000102
Figure BDA0003017780210000103
wherein, Laplace matrix LmodA Laplace matrix which is a fused view i;
solving an objective function: according to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodAnd splicing the first k eigenvectors to obtain an eigenvector matrix U of the fused view i, wherein each row corresponds to an eigenvector of each category, namely the characteristic representation of the data of each category in the data, and then inputting the eigenvector of each category into a classification algorithm to obtain the specific category corresponding to the eigenvector of each category.
In the technical scheme, the multi-view social network data fusion is performed by adopting a multi-view learning algorithm, such as a multi-view spectral clustering algorithm. The following describes the multi-view spectral clustering algorithm as an example. The assumption of the multi-view spectral clustering method is that: each view and the feature matrix of the fused view form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views can be calculated according to the distance formula of different points on the grassmann manifold. The fusion method comprises the following concrete implementation steps:
the method comprises the following steps: and calculating the fused distribution matrix. Adding the distribution matrixes of all the views to obtain
Figure BDA0003017780210000104
Then, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, the clustering condition in the social network data can be directly obtained. According to the matrix, the clustering condition in the social network data can be obtained, namely the user community structure and the user classification in the social network are represented. For the classification task, the following steps may be continued:
step two: computing a weighted sum of the distance of each perspective and the fused view
Figure BDA0003017780210000105
Wherein the weight α of view iiFor each view importance measure, a value is assigned according to the meaning of the selected view. This value acts as a term in the objective function.
Step three: and according to a multi-view spectral clustering method, minimizing the sum of spectral clustering target functions of each view as another item in the target functions. The two are added and then are sorted to obtain a total objective function, namely the problem of minimizing the traces of the matrix:
Figure BDA0003017780210000111
Figure BDA0003017780210000112
wherein, Laplace matrix LmodIs the laplacian matrix of the fused back view.
Step four: and solving the objective function. According to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodSplicing the first k eigenvectors to obtain the fusedAnd (4) a feature matrix U of the view, so that feature representation of data of each category in the data can be obtained, wherein a feature vector of one category corresponds to each row of U. Then, the feature vectors are further input into a classification algorithm to obtain specific classes corresponding to the feature vectors of each class.
Provided is a multi-view clustering system based on hierarchical graph pooling, comprising:
the graph constructing module is used for constructing a corresponding graph representation of the data to be processed according to each visual angle to obtain a corresponding view, and the view comprises an adjacent matrix of a representation graph structure, a feature matrix of the feature information of the representation graph and a graph Laplace matrix;
the cluster information calculation and extraction module is used for extracting the cluster information of each view by adopting a hierarchical graph pooling hierarchical computation method, wherein the cluster information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an iterated adjacency matrix, a feature matrix and a graph Laplace matrix;
and the multi-view fusion module is used for fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to the feature vector of each category.
In the technical scheme, the graph building module builds corresponding graph representation for data of each view, namely views of each view, extracts clustering information of each view through iteration of a layered graph pooling layer, updates a graph adjacency matrix and a feature matrix of each view, and then fuses data to be processed, such as 'social network data', into comprehensive graph data by using multi-view spectral clustering to obtain a feature matrix of a fused cluster center.
And the clustering information calculation and extraction module performs iterative calculation on each view i by using a layered graph pooling method to obtain clustering information of each view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacency matrix, and then taking the graph data as the input of the next iteration. The iteration number m and the class number k in each iteration are preset hyper-parameters.
The multi-view fusion module adopts a multi-view learning algorithm, such as a multi-view spectral clustering algorithm, to perform multi-view social network data fusion. The following describes the multi-view spectral clustering algorithm as an example. The assumption of the multi-view spectral clustering method is that: each view i and the feature matrix of the fused view i form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views i can be calculated according to the distance formula of different points on the grassmann manifold.
Adding the distribution matrixes of all the views to obtain
Figure BDA0003017780210000121
Then, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, the clustering condition in the social network data can be directly obtained. According to the matrix, the clustering condition in the social network data can be obtained, namely the user community structure and the user classification in the social network are represented.
The invention provides a clustering and classifying system based on hierarchical graph pooling for multi-view social network data by starting from the problem of over dependence on specific data characteristics, robustness of a model and limitation of an application scene. The hierarchical graph pooling method is a kind of method for storing the clustering structure information of graphs in graph representation learning. The method aims to extract a coarsened graph, wherein the characteristic vector of each node can store the information of the characteristic vector of one type of node in an input graph, the step is iterated continuously to obtain clustering data in graph data, an adjacency matrix and a characteristic matrix of the graph are input in each iteration, and the clustered adjacency matrix and characteristic matrix are output. The method is introduced into the clustering task of the multi-view social network data, so that the main defects of the mainstream method in the aspect of the clustering task are overcome, and the clustering effect of the model is improved.
Firstly, a multi-view learning method is adopted, learning is carried out on each view, and finally a feature matrix is fused, so that the multi-view characteristic of data is utilized.
Secondly, before view fusion, clustering is carried out on each view by adopting a hierarchical graph pooling method, a node is extracted from each cluster as a representative, the node is used as a node in the coarsened graph, and the feature vectors of the node are aggregated to the feature vectors of all the nodes in the original cluster, namely the feature vectors of all the nodes in the original cluster are included. And then fusing the clustered views, wherein the obtained views can also comprehensively contain the clustering information of the original views.
< example 1>
The invention provides a multi-view social network data classification method based on a hierarchical graph pooling method, which comprises the following steps:
constructing the data of each visual angle into image data by methods such as K neighbor and the like;
clustering data of each visual angle based on a hierarchical graph pooling method Eigenpooling in graph representation learning to obtain clustered graph data of each visual angle;
fusing multi-view clustering data based on a multi-view spectral clustering algorithm to obtain data clustering conditions, and then executing a classification algorithm to obtain a class label of each data;
(1) construction of multi-view social network data
Let N data, N views, in the social network dataset, the data including user personal characteristic data and social data. Social data contains various associations and behaviors between users. Nodes are used to represent users in the data, and edges represent connections between users. And then constructing views respectively according to the inherent link construction between the data and the K-nearest neighbor method. The method comprises the following concrete steps:
the method comprises the following steps: constructing corresponding social networks by taking each connection among data in the data set, such as praise, comment and friend relationship, as an edge respectively to obtain a graph adjacency matrix A(i)
Step two: and dividing the characteristics of the user, and constructing a view for the characteristics of each divided part according to a K-nearest neighbor composition method. And calculating cosine distance values of each data point and other points. The data points are considered herein as nodes of the graph, using the data pointsThe feature vector characterizes the node, and the distance is calculated by regarding the feature vector of the data as a coordinate in an n-dimensional space. Traversing each node, sequencing the distances between other nodes and the nodes, and giving a given neighbor number k, if the other nodes are in the first k neighbors of a certain node, an edge is formed between the other nodes and the neighbor number k, wherein the neighbor number k is a hyper-parameter. Graph adjacency matrix A of data of the graph finally obtained(i). The model structure is shown in fig. 2.
Step three: for the user characteristic views, splicing the characteristic vectors of the data in each view i to obtain a characteristic matrix X(i)And calculating a degree matrix
D(i)=diag(d11,…,dnn),
Figure BDA0003017780210000131
And regularized graph Laplace matrix
Figure BDA0003017780210000132
Wherein
Figure BDA0003017780210000133
Is A(i)Row i and column j. For the user relationship view, values of network structure features in the social network can be selected as features (column vectors) of the nodes, such as degree and centrality; then, splicing the features according to rows to obtain a feature matrix X(i)And calculating a degree matrix D(i)And the Laplace matrix L of the graph(i). An initial set of social network data is formed.
(2) Clustering information extraction for multi-view social network data
In this step, for each view i, a hierarchical graph pooling method Eigenpooling is iteratively used to find the clustering information of the view i. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacent matrix, as the input of the next iteration. The iteration number and the category number k in each iteration are preset hyper-parameters. For each view i, the iteration steps for the ith iteration are as follows:
the method comprises the following steps: for each view, a spectral clustering algorithm is performed. For each view i, find L(i)The first k eigenvalues are used as column vectors to be spliced into an eigenvalue matrix U', and then the row vector y of the eigenvalue matrix is subjected to1,…,ynAnd clustering to obtain a series of non-overlapped clusters. If i is in class j, then the data i is in class j. And setting a node (supernode) in the coarsening graph corresponding to each class cluster.
Step two: this correspondence is represented by the sampling operator C and the allocation matrix as S. Let L-shape(k)Representing the list of nodes in the kth class cluster.
Figure BDA0003017780210000141
Sample operator representing the kth class cluster, C(k)1 if and only if Γ(k)(j)=viThen C is(k)And the corresponding relation between each node in the class cluster and the super node is shown. Wherein
Figure BDA0003017780210000142
The number of nodes in the kth class cluster.
Step three: and solving the structure of the coarsened graph through a sampling operator.
Is provided with
Figure BDA0003017780210000143
Refers to the input graph adjacency matrix for the ith view in the ith iteration,
Figure BDA0003017780210000144
the adjacency matrix within the kth class cluster in the graph is shown as input in the l iteration. Is obtained by the following formula
Figure BDA0003017780210000145
Figure BDA0003017780210000146
Then, an intra-class cluster adjacency matrix is obtained for all class clusters
Figure BDA0003017780210000147
Adjacency matrix between class clusters
Figure BDA0003017780210000148
Setting the distribution matrix of the nodes of the input graph and the output coarsened graph of the ith view in the ith iteration as
Figure BDA0003017780210000149
Wherein
Figure BDA00030177802100001410
If and only if vi∈Γ(j). The assignment matrix records the mapping of the nodes of the input graph to the class clusters of the output graph in each iteration. The adjacency matrix of the outputted coarsened map can then be found:
Figure BDA00030177802100001411
Figure BDA00030177802100001412
including the structural information in the original network.
Step four: and (4) solving a feature matrix of the coarsened graph through graph Fourier transform. Is provided with L(k)A graph laplacian matrix representing the kth class cluster,
Figure BDA00030177802100001413
for its corresponding feature vector, the feature matrix of the input graph is
Figure BDA00030177802100001414
Obtaining a feature matrix of each cluster class by upsampling
Figure BDA00030177802100001415
The feature vectors are then upsampled:
Figure BDA00030177802100001416
defining pooling operators
Figure BDA00030177802100001417
Wherein
Figure BDA00030177802100001418
To what is provided with
Figure BDA00030177802100001419
Subgraph k of individual nodes, for
Figure BDA00030177802100001420
Is provided with
Figure BDA00030177802100001421
Then using the r-th pooling operator thetarPooling the r-th feature matrix, namely performing Fourier transform on the feature matrix of each sub-graph:
Figure BDA00030177802100001422
and then splicing results of different pooling operators to obtain a feature matrix of the coarsened graph after the hierarchical graph pooling method EigenPooling:
Figure BDA0003017780210000151
step five: iterating and executing the first four steps on each view, wherein the iteration execution times m are hyper-parameters, and outputting the feature matrix of the clustered coarsened graph
Figure BDA0003017780210000152
And adjacency matrix
Figure BDA0003017780210000153
Then, the degree matrix of each view is obtained
Figure BDA0003017780210000154
And the Laplace matrix of the graph
Figure BDA0003017780210000155
This represents the characteristic information and structural information after the respective clustering of the respective perspective data, and the allocation matrix
Figure BDA0003017780210000156
This records the mapping of the data of each view to the categories. Through continuous iteration, the model can extract clustering information in the data and the number of categories of the optimal clusters, wherein each category is used as a node of the output coarsened graph. The node integrates the characteristic information of all nodes in the class cluster, and also keeps the cluster structure in the input graph.
(3) Fusion of multi-view social network data clustering information
The invention adopts a multi-view spectral clustering method to perform multi-view information fusion. The assumption of the multi-view spectral clustering method is that: each view and the feature matrix of the fused view form a k-dimensional linear subspace on an n-dimensional space, and correspond to a point on a grassmann manifold G (k, n), so that the distance between different views can be calculated according to the distance formula of different points on the grassmann manifold. The fusion method comprises the following concrete implementation steps:
the method comprises the following steps: and calculating the fused distribution matrix. Adding the distribution matrixes of all the views to obtain
Figure BDA0003017780210000157
Then, for each row of S, the value at the maximum value of the row is set to 1, and the others are set to 0, resulting in a multi-view fused allocation matrix. The matrix S [ i, j ]]1 means that the ith node is of the jth class. According to the matrix, canAnd obtaining the clustering condition in the social network data.
Step two: computing a weighted sum of the distance of each perspective and the fused view
Figure BDA0003017780210000158
Wherein the weight α of view iiFor each view importance measure, a value is assigned according to the meaning of the selected view. This value acts as a term in the objective function.
Step three: and according to a multi-view spectral clustering method, minimizing the sum of spectral clustering target functions of each view as another item in the target functions. The two are added and then are sorted to obtain a total objective function, namely the problem of minimizing the traces of the matrix:
Figure BDA0003017780210000161
Figure BDA0003017780210000162
wherein the Laplace matrix LmodIs the laplacian matrix of the fused back view.
Step four: and solving the objective function. According to Rayleigh-Ritz's theorem, the solution U that minimizes the above objective function is equivalent to LmodAnd splicing the first k eigenvectors to obtain a characteristic matrix U of the fused view, wherein each row corresponds to one category eigenvector, namely the characteristic representation of each category of data in the data. Then, the feature vectors are further input into a classification algorithm to obtain specific classes corresponding to the feature vectors of each class.
< example 2>
The invention provides a multi-view social network data clustering method based on a hierarchical graph pooling technology, which comprises the following steps:
constructing the data of each visual angle into image data by methods such as K neighbor and the like;
clustering data of each visual angle based on a hierarchical graph pooling method Eigenpooling in graph representation learning to obtain clustered graph data of each visual angle;
fusing multi-view clustering data based on a multi-view spectral clustering algorithm to obtain a comprehensive clustering data characteristic matrix and a distribution matrix;
(1) construction of multi-view social network data
Let N data, N views, in the social network dataset, the data including user personal characteristic data and social data. Social data contains various associations and behaviors between users. Nodes are used to represent users in the data, and edges represent connections between users. And then constructing the views respectively according to the inherent link construction between the data and the method of K neighbor. The method comprises the following concrete steps:
the method comprises the following steps: constructing corresponding social networks by taking each connection among data in the data set, such as praise, comment and friend relationship, as an edge respectively to obtain a graph adjacency matrix A(i)
Step two: and dividing the characteristics of the user, and constructing a view for the characteristics of each divided part according to a K-nearest neighbor composition method. And calculating cosine distance values of each data point and other points. The data points are regarded as nodes of the graph, the points are characterized by the characteristic vectors of the data points, and the characteristic vectors of the data are regarded as coordinates in an n-dimensional space, so that the distance is calculated. Traversing each node, sequencing the distances between other nodes and the nodes, and giving a given neighbor number k, if other nodes are in the first k neighbors of a certain node, an edge is formed between the other nodes and the neighbor number k, wherein the neighbor number k is a hyper-parameter. Graph adjacency matrix A of data of the graph finally obtained(i). The model structure is shown in fig. 2.
Step three: for the user characteristic views, splicing the characteristic vectors of the data in each view i to obtain a characteristic matrix X(i)And calculating a degree matrix
D(i)=diag(d11,…,dnn),
Figure BDA0003017780210000171
And regularized graph Laplace matrix
Figure BDA0003017780210000172
Wherein
Figure BDA0003017780210000173
Is A(i)Row i and column j. For a user relationship view, values of network structure features in a social network can be selected as features (column vectors) of nodes, such as degree and centrality; then, splicing the features according to rows to obtain a feature matrix X(i)And calculating a degree matrix D(i)And the Laplace matrix L of the graph(i). An initial set of social network data is formed.
(2) Clustering information extraction for multi-view social network data
In this step, for each view, a hierarchical graph pooling technique Eigenpooling is iteratively used to find the clustering information of the view. And outputting the graph data of the clustered graph (the coarsened graph) after each iteration, namely the characteristic matrix and the adjacent matrix, as the input of the next iteration. The iteration number and the category number k in each iteration are preset hyper-parameters. For each view, the iteration steps for the ith iteration are as follows:
the method comprises the following steps: for each view, a spectral clustering algorithm is performed. For each view i, find L(i)The first k eigenvalues are used as column vectors to be spliced into an eigenvalue matrix U', and then the row vector y of the eigenvalue matrix is subjected to1,…,ynAnd clustering to obtain a series of non-overlapped clusters. If i is in class j, then the data i is in class j. And setting a node (supernode) in the coarsening graph corresponding to each class cluster.
Step two: this correspondence is represented by an allocation matrix S. Setting the distribution matrix of the nodes of the input graph and the output coarsened graph of the ith view in the ith iteration as
Figure BDA0003017780210000174
Wherein
Figure BDA0003017780210000175
If and only if the ith data is assigned in the jth class. The assignment matrix records the mapping of the nodes of the input graph to the class clusters of the output graph in each iteration.
(3) Fusion of multi-view social network data clustering information
In the step, the fused distribution matrix is calculated, and according to the matrix, the clustering condition in the social network data, namely the user community condition in the social network, can be obtained. The concrete implementation steps are as follows:
the method comprises the following steps: adding the distribution matrixes of all the views to obtain
Figure BDA0003017780210000176
Step two: for each row of S, setting the value at the maximum value of the row as 1, and setting the other values as 0, and obtaining the distribution matrix after multi-view fusion. The matrix S [ i, j ] ═ 1 indicates that the ith node is of the jth class.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims (10)

1. The multi-view clustering method based on hierarchical graph pooling is characterized by comprising the following steps of:
dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacent matrix of a representation graph structure, a feature matrix of representation graph feature information and a graph Laplacian matrix;
extracting clustering information of each view by adopting a hierarchical graph pooling and layering iterative computation method, wherein the clustering information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an adjacent matrix, a characteristic matrix and a graph Laplacian matrix after iteration;
and fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to each category of feature vectors.
2. The multi-view clustering method based on hierarchical graph pooling of claim 1 wherein the view is constructed by: each data point is taken as a node of the graph, the data points are characterized by the eigenvectors of the data points, the edges of the graph are constructed according to the association between the data points, and a graph adjacency matrix A is obtained(i)
3. The method for multi-view clustering based on hierarchical graph pooling of claim 2 wherein the method of constructing edges of a graph further comprises a K-nearest neighbor algorithm comprising the steps of:
dividing the feature vector of the data points, constructing a view for the feature of each divided part according to the K-nearest neighbor composition method to obtain a graph adjacency matrix A ^ ((i)) of the view, wherein,
the feature vector of the data point is regarded as a coordinate in an n-dimensional space, the distance value of each data point and other nodes is calculated according to a distance calculation method in an n-dimensional Euclidean space, then each node is traversed, the distances between other nodes are ranked according to the magnitude, for a given neighbor number k, if the other nodes are within the first k neighbor numbers of the node, an edge is formed between the other nodes and the neighbor number k, and the neighbor number k is a hyper-parameter.
4. The multi-view clustering method based on hierarchical graph pooling of claim 3,
method of constructing a feature matrix from the inherent association between data points: selecting the value of the original data structure characteristic of the data point as the characteristic vector of the node, and then splicing the values according to the rows to obtain a corresponding characteristic matrix X(i)And according to the degree matrix and the graph lapraDefinition of the views A degree matrix D is calculated(i)And the Laplace matrix L of the graph(i)Forming an initial data set;
the method for constructing the feature matrix according to the K nearest neighbor algorithm comprises the following steps: splicing the eigenvectors of the data points in each view i according to rows to obtain a corresponding characteristic matrix X(i)And calculating a degree matrix D of the view according to the degree matrix and the definition of graph Laplace(i)And the Laplace matrix L of the graph(i)
5. The multi-view clustering method based on hierarchical graph pooling of claim 1 wherein the clustering information for each view i is computed: and iterating by adopting a hierarchical graph pooling method to obtain clustering information of the hierarchical graph, outputting graph data of the clustered coarsened graph after each iteration, wherein the graph data comprises a characteristic matrix and an adjacent matrix and is used as input of the next iteration, and the iteration number m and the category number k in each iteration are preset hyper-parameters.
6. The multi-view clustering method based on hierarchical graph pooling of claim 5, wherein the calculating and extracting clustering information of each view i specifically comprises the following steps:
step one, setting an algorithm for extracting a clustering center, applying the algorithm to graph data of an input graph, and taking the obtained clustering center as a node of an output graph to obtain a node corresponding relation between the input graph and the output graph: by the set algorithm and the characteristic matrix and the adjacency matrix of the input graph, the nodes of the input graph are divided into different clusters, the characteristic vectors of the same class of nodes in the input graph are comprehensively represented by the characteristic vector of one node in the coarsened graph, and the node is a super node and is regarded as a clustering center;
step two, using distribution matrix
Figure FDA0003017780200000021
And (3) representing the node corresponding relation between the input graph and the output graph: if the node i in the input graph is in a certain class j, namely corresponds to a corresponding node j in the coarsened graph, the distribution matrix is in
Figure FDA0003017780200000022
Where the value of (1) is, if not in a certain class, then the matrix is assigned
Figure FDA0003017780200000023
The middle corresponding position is 0;
step three, obtaining the adjacency matrix of the coarsened graph through the adjacency matrix and the distribution matrix of the input graph
Figure FDA0003017780200000024
Obtaining the structure of the coarsening graph;
aggregating the feature vectors of the same type of nodes in the input graph by an aggregation method of pooling the distribution matrix and the hierarchical graph to obtain the feature vectors of the super nodes, and then splicing the feature vectors of all the super nodes to obtain the feature matrix of the coarsened graph
Figure FDA0003017780200000025
Step five, performing iteration on each view i for m times to finally obtain an adjacent matrix of the coarsened graph
Figure FDA0003017780200000026
And feature matrix
Figure FDA0003017780200000027
Further, the graph Laplace matrix is obtained according to the definition
Figure FDA0003017780200000028
And an allocation matrix
Figure FDA0003017780200000029
Where i is 1, …, N.
7. The multi-view clustering method based on hierarchical pooling of maps according to claim 1 wherein the multi-view spectral clustering fusion method comprises the steps of:
calculating a fused distribution matrix: adding the distribution matrixes of all the views i to obtain
Figure FDA00030177802000000210
Then, for each row of S, setting the value at the maximum value of the row to be 1, and setting the others to be 0, obtaining a multi-view fused distribution matrix, and enabling the matrix S [ i, j to be]And 1 means that the ith node is of the jth class, and the clustering condition in the data to be processed can be directly obtained according to the matrix.
8. The multi-view clustering method based on hierarchical graph pooling of claim 7, wherein for the classification task, the following steps are continued:
computing a weighted sum of the distance of each view i and the fused view
Figure FDA0003017780200000031
Wherein the weight α of view iiFor each view importance measure, assigning a value according to the meaning of the selected view, this value being one of the objective functions;
then according to the multi-view spectral clustering method, minimizing the sum of the spectral clustering target functions of each view as another item in the target functions, adding the two items, and sorting to obtain the total target function,
Figure FDA0003017780200000032
Figure FDA0003017780200000033
wherein, Laplace matrix LmodA Laplace matrix which is a fused view i;
solving an objective function: according to Rayleigh-Ritz's theorem, on minimizationThe solution U of the objective function is equivalent to LmodAnd splicing the first k eigenvectors to obtain an eigenvector matrix U of the fused view i, wherein each row corresponds to an eigenvector of each category, namely the characteristic representation of the data of each category in the data, and then inputting the eigenvector of each category into a classification algorithm to obtain the specific category corresponding to the eigenvector of each category.
9. The multi-view clustering method based on hierarchical graph pooling according to any one of claims 1 to 8, wherein the data to be processed is social network data.
10. Multi-view clustering system based on hierarchical graph pooling, comprising:
the graph construction module is used for dividing data to be processed into multi-view data sets, and then constructing corresponding graph representations of the multi-view data sets according to all views to obtain corresponding views, wherein the views comprise an adjacency matrix of a representation graph structure, a feature matrix of feature information of the representation graph and a graph Laplace matrix;
the cluster information calculation and extraction module is used for extracting the cluster information of each view by adopting a hierarchical graph pooling hierarchical computation method, wherein the cluster information of each view comprises a coarsened graph and an allocation matrix corresponding to the view, and the coarsened graph comprises an iterated adjacency matrix, a feature matrix and a graph Laplace matrix;
and the multi-view fusion module is used for fusing the clustering information of all the views by adopting a multi-view spectral clustering fusion method to obtain the category corresponding to the feature vector of each category.
CN202110393842.8A 2021-04-13 2021-04-13 Multi-view clustering method and system based on hierarchical graph pooling Pending CN113255720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110393842.8A CN113255720A (en) 2021-04-13 2021-04-13 Multi-view clustering method and system based on hierarchical graph pooling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110393842.8A CN113255720A (en) 2021-04-13 2021-04-13 Multi-view clustering method and system based on hierarchical graph pooling

Publications (1)

Publication Number Publication Date
CN113255720A true CN113255720A (en) 2021-08-13

Family

ID=77220630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110393842.8A Pending CN113255720A (en) 2021-04-13 2021-04-13 Multi-view clustering method and system based on hierarchical graph pooling

Country Status (1)

Country Link
CN (1) CN113255720A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688425A (en) * 2023-12-07 2024-03-12 重庆大学 Multi-task graph classification model construction method and system for Non-IID graph data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688425A (en) * 2023-12-07 2024-03-12 重庆大学 Multi-task graph classification model construction method and system for Non-IID graph data

Similar Documents

Publication Publication Date Title
He et al. Why resnet works? residuals generalize
CN108132968B (en) Weak supervision learning method for associated semantic elements in web texts and images
Khrissi et al. Clustering method and sine cosine algorithm for image segmentation
WO2019015246A1 (en) Image feature acquisition
CN105809672B (en) A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
Qiu et al. Deep learning-based algorithm for vehicle detection in intelligent transportation systems
CN113255895B (en) Structure diagram alignment method and multi-diagram joint data mining method based on diagram neural network representation learning
US8429163B1 (en) Content similarity pyramid
CN113822325A (en) Method, device and equipment for supervised learning of image features and storage medium
CN115546525A (en) Multi-view clustering method and device, electronic equipment and storage medium
CN111178196B (en) Cell classification method, device and equipment
Rajendra Prasad et al. An efficient sampling-based visualization technique for big data clustering with crisp partitions
CN108564116A (en) A kind of ingredient intelligent analysis method of camera scene image
Pei et al. Texture classification based on image (natural and horizontal) visibility graph constructing methods
CN113255720A (en) Multi-view clustering method and system based on hierarchical graph pooling
CN113723558A (en) Remote sensing image small sample ship detection method based on attention mechanism
Shi et al. Deep message passing on sets
Chaudhary et al. A review on various algorithms used in machine learning
CN113139556B (en) Manifold multi-view image clustering method and system based on self-adaptive composition
CN113205184B (en) Invariant learning method and device based on heterogeneous hybrid data
Babatunde et al. Comparative analysis of genetic algorithm and particle swam optimization: An application in precision agriculture
CN114638953A (en) Point cloud data segmentation method and device and computer readable storage medium
CN111461265B (en) Scene image labeling method based on coarse-fine granularity multi-image multi-label learning
CN111428741B (en) Network community discovery method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813