CN116304367B - Algorithm and device for obtaining communities based on graph self-encoder self-supervision training - Google Patents

Algorithm and device for obtaining communities based on graph self-encoder self-supervision training Download PDF

Info

Publication number
CN116304367B
CN116304367B CN202310163573.5A CN202310163573A CN116304367B CN 116304367 B CN116304367 B CN 116304367B CN 202310163573 A CN202310163573 A CN 202310163573A CN 116304367 B CN116304367 B CN 116304367B
Authority
CN
China
Prior art keywords
matrix
node
community
self
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310163573.5A
Other languages
Chinese (zh)
Other versions
CN116304367A (en
Inventor
王静红
王慧
王威
于富强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN202310163573.5A priority Critical patent/CN116304367B/en
Publication of CN116304367A publication Critical patent/CN116304367A/en
Application granted granted Critical
Publication of CN116304367B publication Critical patent/CN116304367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Abstract

The invention discloses an algorithm and a device for obtaining communities based on self-supervision training of a graph self-encoder, and relates to the technical field of neural learning methods; the algorithm comprises the steps of combining self-supervision training and self-expression principles to obtain a similarity matrix of paired nodes of a network diagram and guiding generation of a node embedding matrix, wherein the device comprises a community obtaining module, a community obtaining module and a community clustering module, wherein the community obtaining module is used for obtaining the network diagram, obtaining a first damage diagram and a second damage diagram through damage function processing, obtaining the node embedding matrix through pre-training processing, comparing and learning the first damage diagram and the second damage diagram through a noise comparison function based on a standardized mutual information maximization principle, training based on a self-supervision training model until a loss function is minimized and obtaining the node embedding matrix, obtaining the similarity matrix through the self-expression principles and regularization processing and guiding generation of the node embedding matrix, and further obtaining the community clustering matrix; the method combines self-supervision training and self-expression principle, and uses a similarity matrix to guide generation of node embedding matrix and the like, so that efficient and accurate community discovery is realized.

Description

Algorithm and device for obtaining communities based on graph self-encoder self-supervision training
Technical Field
The invention relates to the technical field of neural learning methods, in particular to an algorithm and a device for obtaining communities based on self-supervision training of a graph self-encoder.
Background
The writer searches for TACD_ALL (paper AND community AND neural network AND node AND supervision AND matrix), AND a closer prior art scheme is obtained as follows.
Application publication number is CN114741519A, named a paper correlation analysis method based on graph convolution neural network and knowledge base. The method comprises the steps of extracting key information in a paper set, constructing a paper set knowledge base, combining a graph convolutional neural network, providing an improved acceptance-GCN model to complete paper category division, using a NOCO model to complete paper community discovery, and further completing correlation analysis of papers in the paper set knowledge base. A new graph node classification model is proposed: an acceptance-GCN model. The method of acceptance originally used for the CNN model is combined with the GCN model, so that the new model can effectively solve the problem of fitting and smoothing while enhancing the feature learning capability. Experiments show that the model is used for classifying paper nodes, and better effect can be achieved compared with the prior art.
The authorization bulletin number is CN114357312B, and the name is community discovery method and personality recommendation method based on automatic modeling of the graphic neural network. Acquiring a graphic neural network structure component and constructing a graphic neural network search space; sampling the graph neural network search space to obtain a graph neural network structured initial population; calculating the fitness of each graph neural network model and selecting a plurality of graph neural network structure groups as father generation; searching the child graph neural network structure, calculating the fitness of each child graph neural network structure and updating the parent graph neural network structure group; selecting the optimal graph neural network structure modeling in the parent graph neural network structure group and obtaining a coefficient matrix of graph data; and decomposing the coefficient matrix of the graph data to obtain a similarity matrix of the graph data, and clustering to realize community discovery. A personality recommendation method including the community discovery method based on the graph neural network automatic modeling is also disclosed. The method has high reliability and high accuracy, and is more scientific and reasonable.
Application publication number CN111950594a, an unsupervised graph on a large-scale attribute map named sub-sampling-based, represents a learning method and apparatus. The method comprises the following steps: sub-sampling the attribute graph according to the structure information and the node attribute information of the attribute graph to generate a plurality of sub-graphs; and learning the graph self-encoder on each subgraph by utilizing the structure information, the node attribute information and the community information of the attribute graph to obtain the low-dimensional vector representation of the nodes in the attribute graph. The self-encoder comprises an encoder and a decoder; the encoder adopts a graph convolution neural network; the decoder comprises a graph structure loss reconstruction decoder, a graph content loss reconstruction decoder and a graph community loss reconstruction decoder. The method is used for supporting a user to learn low-dimensional vector representations of nodes in a large-scale attribute graph in an unsupervised mode, the vector representations can keep topological structure information and node attribute information on the graph as far as possible, and the vectors are applied to different downstream tasks as inputs to perform data mining tasks on the graph.
In combination with the above three patent documents and prior art schemes, the inventors analyzed the prior art schemes as follows.
(1) Prior art solution
The graph or network is ubiquitous in our daily lives. Network representation learning is the task of mapping different components of the network, such as nodes, edges, or the entire graph, to vector space in order to facilitate network downstream tasks. In the real world we have formed many complex networks based on the closeness of community structure, such as social networks, citation networks, transportation networks and protein interaction networks. There are complex interactions between nodes in a network, and interactions and node properties will cause the network to form different communities. From a topological perspective, the connections of nodes inside the community are relatively tight, while the connections of nodes outside are relatively sparse. Community discovery is one of the most important tasks in network analysis, and node modularization mining of potential community structures is quite important for understanding complex systems and knowledge discovery, and has been widely used in social, biological, computer engineering and other fields.
Most graph neural networks Graph neural network, GNNs, are represented in the form of messaging networks, where each node aggregates messages from neighboring nodes with its own message to update its vector representation. However, community discovery algorithms based on graph neural networks have not been explored much in comparison to other tasks such as node classification and link prediction.
In recent years, deep learning and convolutional neural network CNN has made remarkable breakthrough in various fields such as machine translation and reading understanding, for example, natural language processing NLP, object detection and image classification in computer vision CV. The graph neural network-based representation learning method has a supervised learning method and an unsupervised learning method. The supervised learning method requires that the data contain marked data, but in reality most are unlabeled data, and marked data can be costly. The unsupervised learning does not need a large number of marked nodes to train, and meanwhile, learning characterization is performed and local characteristics of the sample are reserved to output discrimination characteristics. The community discovery task is essentially an unsupervised learning task, but direct training of GNNs for community discovery in existing methods has challenges.
Most of the existing representation learning methods aim at isomorphic networks to carry out node embedded learning, and the social network is considered to be isomorphic, and all sides of all nodes in the network belong to a single type. Deep walk is the first embedding algorithm to learn neighborhood features by learning random encoding of each node scope in isomorphic graphs, and both the algorithm and node2vec algorithm use self-encoder for node traversal.
In recent years, some studies have enabled performance of close-supervised learning using self-supervised learning, and Velickovic et al in 2019 proposed DGI to maximize information between different graphic entities including graphic level to node level, corrupted version of graphics, etc., by expanding the concept of information maximization.
Traditionally, a method has been proposed to train a network representation learning algorithm on a generic unsupervised penalty and then apply a clustering algorithm as a post-processing step to find communities. Zhang et al in 2019 propose an adaptive graph convolution method that performs a higher order graph convolution to obtain a smoothed node embedding that captures the global cluster structure, the obtained node embedding then being used to detect communities using spectral clustering. He et al in 2020 propose a community-centric graph rolling network method that obtains node community membership in the hidden layer of the encoder, and introduces a community-centric double decoder to reconstruct the network structure and node attributes in an unsupervised manner. Our work is moving towards directly obtaining node communities within the framework of a graph neural network.
(2) Disadvantages of the prior art
Compared with other tasks such as node classification, link prediction and the like, the community discovery algorithm based on the graph neural network has not been studied and explored deeply. In real world networks, it is costly to directly obtain the real community labels or pairwise constraints, and the prior art mostly works with the node representation learning module and clustering algorithm independently. The existing unsupervised representation learning method is difficult to process complex network data, only low-level semantic features of the network data can be obtained, attribute network data cannot be processed, and an efficient and accurate unsupervised representation learning method is needed to perform community discovery network analysis tasks on the complex network.
Problems and considerations in the prior art:
how to solve the technical problems of low efficiency and poor accuracy of the community.
Disclosure of Invention
The invention provides an algorithm and a device for obtaining communities based on self-supervision training of a graph self-encoder, which solve the technical problems of low efficiency and poor accuracy of community discovery.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an algorithm for obtaining communities based on self-supervision training of a graph self-encoder comprises the steps of combining self-supervision training with a self-expression principle to obtain a similarity matrix S of paired nodes of a network graph G, guiding generation of a node embedding matrix Z, and obtaining communities in an integrated mode.
The further technical proposal is that: the method specifically comprises the following steps: obtaining a network graph G, and obtaining a first damage graph G through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 And (3) comparing and learning by a noise comparison function based on a standardized mutual information maximization principle, training by a self-supervision training model until a loss function is minimized, obtaining an optimized node embedded matrix Z, obtaining a similarity matrix S of paired nodes by self-expression principle and regularization treatment, and obtaining a community cluster matrix C of the nodes based on the similarity matrix S and the fully-connected multi-layer perceptron.
The further technical proposal is that: in the step of obtaining the community cluster matrix C of the nodes, probability distribution of due connection and no connection relation of the nodes is obtained in an unsupervised mode so as to guide the formation of communities.
The further technical proposal is that: g= (V, E, X), where v= {1,2, …, N } is a set of nodes,is a set of edges, assuming each node is in vector x i Including some attribute values x i ∈R F ,X=[x 1 ,x 2 ,…,x i ,…,x N ] T ∈R N×F Is a node attribute matrix of the network graph; the object is to learn the function f V-K ]Wherein [ K ]]= {1,2, …, K } is an index set representing community clusters, each node being mapped to one community by the network structure and node properties of the graph.
The further technical proposal is that:
in the formula (1), Z is a node embedding matrix, each behavior of Z is a vector representation of a node, and Z is E R |V|×F′ The method comprises the steps of carrying out a first treatment on the surface of the X is the attribute matrix of the node, A is the adjacency matrix of the input network graph,I∈R |V|×|V| is a unit matrix, D is a degree matrix, < >>Relu (·) is the activation function, W 0 、W 1 All are weight parameters of the picture scroll lamination; z is Z 1 Is a first damage graph G 1 Node embedded matrix of Z 2 Is a second damage graph G 2 Is embedded in the matrix, and parameter sharing is maintained during GCN encoder training.
The further technical proposal is that:
in the formula (8), S is a similarity matrix, L is a characteristic matrix generated after singular value decomposition of a coefficient matrix, L' is a normalized L matrix, and a negative element is set to be 0, L T Is the transpose of the L matrix, L Is an infinite norm of the L matrix.
The further technical proposal is that: the community cluster matrix C comprises N rows of community vectors C i
C i =Softmax(MLP(Z i ))∈R K (9)
In the formula (9), C i For the community vector of the ith node in the community cluster matrix, softmax (·) is an activation function, MLP is a three-layer neural network, and MLP takes each node vector Z i Mapping to a K-dimensional vector, wherein K is the number of community clusters, and K is assumed to be known; the softmax layer converts the K-dimensional vector into a probability distribution such that C iK Namely C i Represents the probability that the ith node belongs to the kth cluster, such that there is a similarity inlayThe incoming nodes will be mapped to similar locations in the (K-1) dimensional probability distribution.
The further technical proposal is that: continuously updating node embedding to guide the generation of a community cluster matrix by training MLP parameters; the optimization objective of the community is found as formula (10),
in the formula (10), C i C is the community vector of the ith node in the community cluster matrix j S is a community vector of the jth node in the community cluster matrix ij For the similarity matrix of the node i and the node j, when S ij When the value is high, the node pair (i, j) is constrained in the same community, when S ij When the value is low, then node i and node j are in different communities.
The further technical proposal is that: the joint optimization is performed according to the self-supervision training parameters and the MLP parameters, and the total loss consists of a weighted sum of two loss parts of training node embedding and community discovery, as shown in a formula (11),
min L total =αL ss +L comm (11)
in formula (11), min L total To minimize the overall training loss, α is the weighting factor in the optimization process, L ss For self-supervision and contrast of the loss of training node embedding, L comm Is a loss of community discovery.
The device for obtaining communities based on self-supervision training of the graph self-encoder comprises a community obtaining module, a network graph G obtaining module and a first damage graph G obtaining module, wherein the network graph G is obtained through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 Based on the principle of maximizing the standardized mutual information, the method is subjected to noise comparison function comparison learning, is trained based on a self-supervision training model until the loss function is minimized, an optimized node embedded matrix Z is obtained, and is obtained through self-expression principle and regularization treatmentAnd obtaining a similarity matrix S of paired nodes, guiding to generate a node embedding matrix Z by using the similarity matrix S, and obtaining a community cluster matrix C of the nodes based on the similarity matrix S and the fully-connected multi-layer perceptron.
The further technical proposal is that: the community acquisition module is further used for acquiring a community cluster matrix C of the nodes, wherein the community cluster matrix C is used for acquiring probability distribution of due connection and no connection relation of the nodes in an unsupervised mode so as to guide formation of communities.
An apparatus for obtaining communities based on graph self-encoder self-supervision training comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor realizes the corresponding steps when executing the computer program.
An apparatus for obtaining communities based on graph self-encoder self-supervised training includes a computer readable storage medium storing a computer program which when executed by a processor performs the respective steps described above.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in:
firstly, an algorithm for obtaining communities based on self-supervision training of a graph self-encoder comprises the steps of combining self-supervision training with a self-expression principle to obtain a similarity matrix S of paired nodes of a network graph G, guiding a generating node to be embedded into a matrix Z, and obtaining communities in an integrated mode. According to the technical scheme, the self-supervision training is combined with the self-expression principle, the similarity matrix S is used for guiding the generation of the node embedding matrix Z and the like, and the community discovery is efficient and accurate.
Second, a device for obtaining communities based on self-supervision training of a graph self-encoder comprises a community obtaining module for obtaining a network graph G and obtaining a first damage graph G through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 Based on the principle of maximizing the standardized mutual information, the noise comparison function is used for comparison and learning, and based on the self-supervision training model, the training is carried out until the loss functionThe number is minimized, an optimized node embedding matrix Z is obtained, a similarity matrix S of paired nodes is obtained through self-expression principle and regularization treatment, the node embedding matrix Z is guided to be generated through the similarity matrix S, and a community cluster matrix C of the nodes is obtained based on the similarity matrix S and the fully-connected multi-layer perceptron. According to the technical scheme, the self-supervision training is combined with the self-expression principle, the similarity matrix S is used for guiding the generation of the node embedding matrix Z and the like, and the community discovery is efficient and accurate.
See the description of the detailed description section.
Drawings
FIG. 1 is a flow chart of the present application;
FIG. 2 is a data flow diagram of the present application;
FIG. 3 is a first data plot of algorithm comparison experimental results;
FIG. 4 is a second data plot of results of an algorithm comparison experiment;
FIG. 5 is a graph comparing the performance of five algorithms on Cora;
fig. 6 is a graph comparing the performance of five algorithms on a citieser.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
Example 1:
the application discloses an algorithm for obtaining communities based on self-supervision training of a graph self-encoder, which comprises the steps of combining the self-supervision training with a self-expression principle to obtain a similarity matrix S of paired nodes of a network graph G, guiding a node to be generated to be embedded into a matrix Z, and obtaining communities in an integrated mode. The method specifically comprises the following steps:
obtaining a network graph G, and obtaining a first damage graph G through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 And (3) comparing and learning by a noise comparison function based on a standardized mutual information maximization principle, training by a self-supervision training model until a loss function is minimized, obtaining an optimized node embedding matrix Z, obtaining a similarity matrix S of paired nodes by self-expression principle and regularization treatment, guiding to generate the node embedding matrix Z by the similarity matrix S, and obtaining a community cluster matrix C of the nodes based on the similarity matrix S and the fully-connected multi-layer perceptron.
The step of obtaining the community cluster matrix C of nodes includes obtaining probability distribution of due connection and no connection relation of the nodes in an unsupervised manner so as to guide formation of communities.
G= (V, E, X), where v= {1,2, …, N } is a set of nodes,is a set of edges, assuming each node is in vector x i Including some attribute values x i ∈R F ,X=[x 1 ,x 2 ,…,x i ,…,x N ] T ∈R N×F Is a node attribute matrix of the network graph; the object is to learn the function f V-K]Wherein [ K ]]= {1,2, …, K } is an index set representing community clusters, each node being mapped to one community by the network structure and node properties of the graph.
In the formula (1), Z is a node embedding matrix, each behavior of Z is a vector representation of a node, and Z is E R |V|×F′ The method comprises the steps of carrying out a first treatment on the surface of the X is the attribute matrix of the node, A is the adjacency matrix of the input network graph,I∈R |V|×|V| is a unit matrix, D is a degree matrix, < >>Relu (·) is the activation function, W 0 、W 1 All are weight parameters of the picture scroll lamination; z is Z 1 Is a first damage graph G 1 Node embedded matrix of Z 2 Is a second damage graph G 2 Is embedded in the matrix, and parameter sharing is maintained during GCN encoder training.
In the formula (8), S is a similarity matrix, L is a characteristic matrix generated after singular value decomposition of a coefficient matrix, L' is a normalized L matrix, and a negative element is set to be 0, L T Is the transpose of the L matrix, L Is an infinite norm of the L matrix.
The community cluster matrix C comprises N rows of community vectors C i
C i =Softmax(MLP(Z i ))∈R K (9)
In the formula (9), C i For the community vector of the ith node in the community cluster matrix, softmax (·) is an activation function, MLP is a three-layer neural network, and MLP takes each node vector Z i Mapping to a K-dimensional vector, wherein K is the number of community clusters, and K is assumed to be known; the softmax layer converts the K-dimensional vector into a probability distribution such that C iK Namely C i Represents the probability that the ith node belongs to the kth cluster, such that nodes with similar embeddings will be mapped to similar positions in the (K-1) dimensional probability distribution.
Continuously updating node embedding to guide the generation of a community cluster matrix by training MLP parameters; the optimization objective of the community is found as formula (10).
In the formula (10), C i C is the community vector of the ith node in the community cluster matrix j S is a community vector of the jth node in the community cluster matrix ij For the similarity matrix of the node i and the node j, when S ij When the value is high, the node pair (i, j) is constrained in the same community, when S ij When the value is low, then node i and node j are in different communities.
And (3) carrying out joint optimization according to the self-supervision training parameters and the MLP parameters, wherein the total loss consists of a weighted sum of two loss parts of training node embedding and community discovery, as shown in a formula (11).
min L total =αL ss +L comm (11)
In formula (11), min L total To minimize the overall training loss, α is the weighting factor in the optimization process, L ss For self-supervision and contrast of the loss of training node embedding, L comm Is a loss of community discovery.
Example 2:
the invention discloses a device for obtaining communities based on self-supervision training of a graph self-encoder, which comprises the following modules:
the community obtaining module is used for obtaining a network graph G, and obtaining a first damage graph G through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 Based on the principle of maximizing standardized mutual information, comparing and learning by a noise comparison function, training by a self-supervision training model until a loss function is minimized, obtaining an optimized node embedded matrix Z, and obtaining a similarity matrix S of paired nodes by self-expression principle and regularization treatment so as toThe similarity matrix S guides generation of a node embedding matrix Z, and based on the similarity matrix S and the fully-connected multi-layer perceptron, probability distribution of due connection and no connection relation of the nodes is obtained in an unsupervised mode so as to guide formation of communities and obtain a community cluster matrix C of the nodes.
Example 3:
the invention discloses a device for obtaining communities based on self-supervision training of a graph self-encoder, which is electronic equipment, wherein the electronic equipment comprises the device of the embodiment 2.
Example 4:
the invention discloses a device for obtaining communities based on self-supervision training of a graph self-encoder, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the memory and the processor form an electronic terminal, and the processor realizes the steps of the embodiment 1 when executing the computer program.
Example 5:
the present invention discloses a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of embodiment 1.
Compared with the above embodiment, the program modules may be hardware modules made by using the existing logic operation technology, so as to implement the corresponding logic operation steps, communication steps and control steps, and further implement the corresponding steps, where the logic operation unit is not described in detail in the prior art.
The research and development process comprises the following steps:
the invention is characterized in that: unsupervised represents an improvement in the community discovery method.
1 the most basic technical problems to be solved
An unsupervised loss function is designed based on the graph convolution neural network, the graph convolution neural network is trained in a self-supervision mode, and an algorithm for extracting communities is realized in an integrated mode.
2 core technical scheme
We use g= (V, E, X) to represent an input network graph, where v= {1,2, …, N } is a set ofThe node point is a node point which,is a set of edges. Assume that each node is in vector x i Including some attribute values x i ∈R F ,X=[x 1 ,x 2 ,…,x i ,…,x N ] T ∈R N×F Is a node attribute matrix of the network graph. The goal of our algorithm is to learn the function f V.fwdarw.K]Wherein [ K ]]= {1,2, …, K } is an index set representing community clusters, each node being mapped to one community by the network structure and node properties of the graph.
The research network comprises a quotation network, a social network and the like, the quotation network is used in experiments, and the community algorithm is not only used for the quotation network, but also can be used for the social network, and the network with node, side and node attributes can be used. Four citation networks were used in the experiments. .
The meaning of the symbols is given in table 1.
Table 1: meaning table of symbols
As shown in fig. 1, which is a flowchart of a community discovery algorithm based on self-supervision training of a graph self-encoder, the partitioning includes the following steps:
Step 1: generating a damage graph G from an input graph G 1 And G 2 Positive and negative sample pairs of node V are defined.
As shown in FIG. 2, a diagram of the overall framework of a community discovery algorithm based on self-supervised training of a graph self-encoder is provided.
We generate two damage graphs G from an input graph G using a damage function 1 And G 2 . The damage graph is generated by randomly deleting small parts of edges from the input graph G, and keeping the vertices unchanged. For any node i ε V, at G 1 And G 2 Corresponding node i in (a) is denoted as G 1 (i) And G 2 (i) We will (G) 1 (i),G 2 (i) Defined as positive sample pairs. Randomly selecting a group of nodes V i- = { j∈v|j+.i }, we will (G) 1 (i),G 1 (j))(G 1 (i),G 2 (j) Defined as a negative sample pair.
Step 2: the pre-training mode generates a node embedding matrix Z of the input graph G.
We will corrupt graph G using a graph convolution GCN encoder 1 And G 2 Generating a corresponding node embedded representation Z 1 And Z 2 . The graph convolutional encoder generation process is shown in equation (1).
In the formula (1), Z is the generated node embedded matrix, each behavior of Z is a vector representation of a node, and Z epsilon R |V|×F′ . X is the attribute matrix of the node, A is the adjacency matrix of the input network graph,I∈R |V|×|V| is a unit matrix, D is a degree matrix, < >>Relu (·) is the activation function, W 0 、W 1 The weight parameters for the graph roll stack. Z is Z 1 And Z 2 Is a graph G 1 And G 2 Is embedded in the matrix, and parameter sharing is maintained during GCN encoder training.
Step 3: minimizing two graphs G 1 And G 2 Finally generating the embedded matrix Z of the node.
In the method, network structure and node attribute information of the graph are considered simultaneously, a model is trained based on a self-supervision mode according to the principle of mutual information maximization between two damaged graphs of an input graph, and an embedded matrix Z of the node is finally generated. G 1 And G 2 As shown in equation (2), our training goal is to minimize the loss function.
In the formula (2), Z 1i And Z 2i Respectively, node i is at G 1 (i) And G 2 (i) We calculate the embedding loss of the positive and negative pairs using cosine similarity cos (·), τ being a temperature parameter. The loss function includes two parts, a positive sample pair and a negative sample pair.
Step 4: and (3) learning to generate a similarity matrix S of paired nodes according to the node embedding matrix Z generated in the step (1-3).
The objective problem is to derive a node similarity matrix S according to a node embedding matrix Z. Using the principle of self-expression of nodes, we try to represent node i using the linear sum of other nodes, as in equation (3).
In the formula (3), Z i Is the embedded vector of node i, Z j Is the embedded vector of node j, p ij Is the similarity coefficient between node i and node j, defining p ii =0。
We use the F-norm normalized reconstruction node embedding matrix Z, optimize the objective function as equation (4),
in the formula (4), Z is a node embedding matrix, P is a similarity coefficient matrix, and lambda 1 Is an optimized weight parameter.
Following is the process of constructing a similarity matrix S for nodes, we train the data using matrix decomposition and batch processing techniques. First, a coefficient matrix is calculated according to the similarity coefficient P between nodesAs in equation (5),
because of the large data dimension in the dataset, we randomly sample the data using batch processing techniques, matrix coefficients for ease of computation and storageSVD decomposition with rank r is performed, as in equation (6) (7),
in the formula (6), r=4k+1, u is a left eigenvector matrix, Σ is a singular value diagonal matrix, V T Is the right eigenvector matrix.
Regularizing each row of L, and setting the negative value in L as 0 to obtain L'.
Finally, constructing a similarity matrix S, such as formula (8), S ij ∈[0,1],
In equation (8), the similarity matrix S.
We randomly sample and select M nodes, wherein M is less than or equal to N. Batch processing trains the loss in equation (4). The node similarity matrix S generated in the step can guide the node obtained in the step 1-3 to be embedded into Z to generate a community C.
Step 5: a community cluster matrix C of nodes is discovered using a fully connected multi-layer perceptron-based approach.
We use a method with trainable parameters W in this step MLP Is a fully connected multi-layer perceptron (MLP) that embeds and maps each node to its corresponding nodeA community member vector as shown in equation (9).
C i =Softmax(MLP(Z i ))∈R K (9)
In the formula (9), C i For the community vector of the ith node in the community cluster matrix, softmax (·) is an activation function, MLP is a three-layer neural network, and MLP takes each node vector Z i Mapping to a K-dimensional vector, K being the number of community clusters, we assume that K is known. The softmax layer converts the K-dimensional vector into a probability distribution such that C iK (C i K-th element of (c-1) represents the probability that the i-th node belongs to the K-th cluster, such that nodes with similar embeddings will be mapped to similar locations in the (K-1) dimensional probability distribution.
By using the information learned from the paired node similarity obtained in step 4, the MLP parameters in this step are trained to continuously update the node embedment to guide the generation of the community cluster matrix. The optimization objective of the community is found as formula (10),
in the formula (10), C i C is the community vector of the ith node in the community cluster matrix j S is a community vector of the jth node in the community cluster matrix ij For the similarity matrix of the node i and the node j, when S ij When the value is high, the node pair (i, j) should be approximately constrained to be in the same community, when S ij When the value is low, then node i and node j are in different communities. Thus, we generate a probability distribution of how a set of nodes should be connected and not connected in an unsupervised manner to guide the formation of communities.
The invention solves the problem of optimization goal, we are to perform joint optimization according to the self-supervision training parameters and MLP parameters, the total loss is composed of the weighted sum of the training node embedding and community discovery two-part loss, as in formula (11),
min L total =αL ss +L comm (11)
in the formula (11), alpha is an optimized weight factor, and consists of two parts of a formula (2) and a formula (10). The node pair similarity value is obtained by solving the batch learning technology of the step 4. The entire algorithm proceeds in an iterative fashion by solving for the similarity of each batch node, and then updating the parameters of the neural network by minimizing equation (11).
The whole algorithm flow is to input a network diagram G, including a node number N, a community category number K, a batch processing size M, a batch number H and output a community member vector C of any node i i . Firstly, initializing parameters of a self-supervision training graph neural network and a clustering MLP neural network, pre-training to obtain an embedding matrix Z of nodes, randomly selecting node samples according to batch size and batch number, optimizing an objective function according to a formula (4), and batch learning a similarity matrix of paired nodes in the network; iterative algorithm, using self-supervision graph neural network to generate node embedding matrix Z, generating community member vector C for any node i i Parameters in the neural network and the MLP network are continuously updated according to the loss function formula (11).
The distinguishing technical characteristics are as follows:
the application combines self-supervision training and self-expression principle to generate a pair node similarity matrix, and guides the generation of a node embedding matrix, and extracts an algorithm of a community in an integrated mode.
3 beneficial technical effects
After the application runs for a period of time internally, the feedback of field technicians is beneficial in that:
the SECD algorithm can achieve better performance on all data sets and all indicators. In terms of clustering accuracy ACC performance, the SECD algorithm is improved by 10.1% on the Cora data set, 2.9% on the Citeser data set, 11.3% on the Wiki data set and 5.8% on the Pubmed data set compared with the AGC algorithm. Compared with the AGC algorithm, the SECD is improved by 4.1% on the Cora data set, 3.0% on the Citeseer data set, 12.1% on the Wiki data set and 13.8% on the Pubmed data set in the mutual information NMI performance.
See tables 3 and 4 for specific data.
As shown in fig. 3 and 4, the performance of the algorithm is improved on two indexes of accuracy ACC and NMI standards mutual information. The "diagonal" bar graph shows the performance of the proposed algorithm versus the other two most advanced algorithms.
4 inventive concept
Compared with other tasks such as node classification, link prediction and the like, the community discovery algorithm based on the graph neural network has not been studied and explored deeply. In real world networks, it is costly to directly obtain the real community labels or pairwise constraints, and the prior art mostly works with the node representation learning module and clustering algorithm independently. The invention aims to solve the problems that an unsupervised loss function is designed based on a graph convolution neural network, the graph convolution neural network is trained in a self-supervision mode, and an algorithm for extracting communities is realized in an integrated mode.
Application description:
the invention provides a community discovery algorithm based on self-supervision training of a graph self-encoder, which fuses a self-supervision graph neural network with a self-expression principle, and simultaneously considers the network structure and node attribute information of the graph to solve the community discovery problem. Training in an end-to-end manner is performed on a plurality of public network data sets, and compared with the performance of a plurality of algorithms, the algorithm of the invention achieves the best performance in community discovery tasks, and has the advantage of being applied to real data sets.
Application example 1: experiments were performed on the Cora dataset using the present algorithm. The Cora dataset is a citation network of machine-learned papers, containing 2708 papers, 5429 edges, for a total of 7 categories. Each paper in the dataset is described by a word vector of 0/1 value, representing the absence/presence of the corresponding word in the dictionary. The dictionary consists of 1433 unique words, each paper consists of 1433 features, each feature being represented only by 0/1.
Application example 2: experiments were performed on the citeser dataset using the present algorithm. The citeser dataset is a quotation network of document words, containing sparse word feature vector packages for each document and a list of quotation links between documents. These tags contain six areas: agents, artificial intelligence, databases, information retrieval, machine language, and human-machine interaction.
Application example 3: experiments were performed on Wiki datasets using the present algorithm. The Wiki dataset is a quotation network of a collection of web pages, including 2405 web pages, 17981 edges. Wherein the nodes are web pages and if one links to another, then it is connected to the other.
Application example 4: the present algorithm was applied to experiments on Pubmed datasets. The Pubmed dataset is a citation network comprising 19717 scientific publications on diabetes from Pubmed database, divided into three categories: diabetes Mellitus, experimenal, diabetes Mellitus Type, diabetes Mellitus Type 2, citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector in a dictionary of 500 unique words.
See table 2 for details of the four data sets. Cora, citeseer and Pubmed are reference datasets in which nodes correspond to papers, and are connected by an edge if one paper references another. Wiki is a collection of web pages where a node is a web page that if linked to another web page, then links to the other web page.
Table 2: experimental data set information
In the application embodiment of the invention, the algorithm is compared with two attribute map embedding algorithms and six node clustering algorithms. These eight algorithms are classified into three types, using only node features, using only network structures, using both node features and network structures. The SECD algorithm provided by the invention is an algorithm which uses node characteristics and network structure information simultaneously. The community discovery results are evaluated using two index accuracy ACC and normalized mutual information NMI. And performing evaluation index calculation according to the real community information in the data set. See table 3 for accuracy results of the SECD algorithm on the four data sets and table 4 for NMI mutual information results of the SECD algorithm on the four data sets.
For the attribute map embedding method, two algorithms are compared: GAE and VGAE combine the graph convolution network with a variational automatic encoder for representation learning in downstream task node classification and community discovery. For the node clustering method, a comparison is made with six other algorithms. These six algorithms can be divided into three classes:
(1) The method uses only features. Kmeans and spectral clustering are two conventional clustering algorithms. Spectran-F takes as input the cosine similarity of node features.
(2) The method uses only graph structures. Spectran-G is a Spectral cluster with the adjacency matrix as the input similarity matrix. Deep walk is embedded by learning nodes using skip gram on a random walk path generated on the graph.
(3) The method uses both features and graph structures. The AGC uses higher order graph convolution to filter node features and select the number of graph convolution layers for different data sets. The GUCD introduces local enhancements to potential communities, obtains node community membership in the hidden layer of the encoder, and introduces community-centric double decoders.
Table 3: SECD algorithm ACC on four datasets
Table 4: the SECD algorithm NMI on four datasets
The average performance was calculated by running ten times on each dataset. The SECD algorithm can achieve better performance on all data sets and all indicators. In terms of clustering accuracy ACC performance, the SECD algorithm is improved by 10.1% on the Cora data set, 2.9% on the Citeser data set, 11.3% on the Wiki data set and 5.8% on the Pubmed data set compared with the AGC algorithm. Compared with the AGC algorithm, the SECD is improved by 4.1% on the Cora data set, 3.0% on the Citeseer data set, 12.1% on the Wiki data set and 13.8% on the Pubmed data set in the mutual information NMI performance.
As shown in fig. 5 and 6, the performance effects of the present algorithm are shown in comparison with the four most advanced algorithms in the Cora dataset and the Citeseer dataset.
At present, the technical scheme of the invention has been subjected to pilot-scale test, namely, smaller-scale test of products before large-scale mass production; after the pilot test is completed, the use investigation of the user is performed in a small range, and the investigation result shows that the user satisfaction is higher; now, the preparation of the formal production of the product for industrialization (including intellectual property risk early warning investigation) is started.

Claims (5)

1. An algorithm for obtaining communities based on graph self-encoder self-supervision training, which is characterized in that: combining self-supervision training with a self-expression principle to obtain a similarity matrix S of paired nodes of a network diagram G, guiding a node to be generated to be embedded into a matrix Z, and obtaining communities in an integrated mode; the method specifically comprises the following steps: obtaining a network graph G, and obtaining a first damage graph G through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 Based on a standardized mutual information maximization principle, performing noise contrast function contrast learning, training based on a self-supervision training model until a loss function is minimized, obtaining an optimized node embedded matrix Z, obtaining a similarity matrix S of paired nodes through self-expression principle and regularization processing, and obtaining a community cluster matrix C of the nodes based on the similarity matrix S and the fully-connected multi-layer perceptron;
G= (V, E, X), where v= {1,2, …, N } is a set of nodes,is a set of edges, assuming each node is in vector x i Including some attribute values x i ∈R F ,X=[x 1 ,x 2 ,…,x i ,…,x N ] T ∈R N×F Is a node attribute matrix of the network graph; the object is to learn the function f V-K]Wherein [ K ]]= {1,2, …, K } is an index set representing community clusters, each node is mapped to one community by the network structure and node attribute of the graph;
g is a quotation network diagram, V is a group of nodes which are papers, documents or web pages, E is a quotation relationship among the edges of the papers, the documents or the web pages;
in the formula (1), Z is a node embedding matrix, each behavior of Z is a vector representation of a node, and Z is E R |V|×F′ The method comprises the steps of carrying out a first treatment on the surface of the X is the attribute matrix of the node, A is the adjacency matrix of the input network graph,I∈R |V|×|V| is a unit matrix, D is a degree matrix, < >>Relu (·) is the activation function, W 0 、W 1 All are weight parameters of the picture scroll lamination; z is Z 1 Is a first damage graph G 1 Node embedded matrix of Z 2 Is a second damage graph G 2 Is embedded into the matrix, and parameter sharing is kept in the GCN encoder training process;
according to the generated node embedding matrix Z, learning to generate a similarity matrix S of paired nodes;
using the self-expression principle of the node, and using the linear addition of other nodes to express the node i, such as a formula (3);
In the formula (3), Z i Is the embedded vector of node i, Z j Is the embedded vector of node j, p ij Is the similarity coefficient between node i and node j, defining p ii =0;
The reconstruction node embedding matrix Z is normalized by the F-norm, the objective function is optimized as in equation (4),
in the formula (4), Z is a node embedding matrix, P is a similarity coefficient matrix, and lambda 1 Is an optimized weight parameter;
the following is a process of constructing a similarity matrix S for the nodes, training data using matrix decomposition and batch processing; calculating coefficient matrix according to similarity coefficient P between nodesAs in equation (5),
because the data dimension in the data set is large, the data is randomly sampled by batch processing, and the coefficient matrix is used for facilitating calculation and storageSVD decomposition with rank r is performed, as in equation (6) and equation (7),
in the formula (6), r=4k+1, u is a left eigenvector matrix, Σ is a singular value diagonal matrix, V T Is a right eigenvector matrix;
regularizing each row of L, and setting a negative value in L as 0 to obtain L';
finally, constructing a similarity matrix S, such as formula (8), S ij ∈[0,1],
In the formula (8), S is a similarity matrix, L is a characteristic matrix generated after singular value decomposition of a coefficient matrix, L' is a normalized L matrix, and a negative element is set to be 0, L T Is the transpose of the L matrix, L Is an infinite norm of the L matrix;
randomly sampling and selecting M nodes, wherein M is less than or equal to N; the loss in batch training equation (4); in the step, the node similarity matrix S is generated, so that the obtained node embedding Z can be guided to generate a community C;
the community cluster matrix C comprises N rows of community vectors C i
C i =Softmax(MLP(Z i ))∈R K (9)
In the formula (9), C i For the community vector of the ith node in the community cluster matrix, softmax (·) is an activation function, MLP is a three-layer neural network, and MLP takes each node vector Z i Mapping to a K-dimensional vector, wherein K is the number of community clusters, and K is assumed to be known; the softmax layer converts the K-dimensional vector into a probability distribution such that C iK Namely C i Represents the probability that the ith node belongs to the kth cluster, such that nodes with similar embeddings will be mapped to similar locations in the (K-1) dimensional probability distribution;
in the step of obtaining the community cluster matrix C of the nodes, probability distribution of due connection and no connection relation of the nodes is obtained in an unsupervised mode so as to guide the formation of communities;
continuously updating node embedding to guide the generation of a community cluster matrix by training MLP parameters; the optimization objective of the community is found as formula (10),
In the formula (10), C i C is the community vector of the ith node in the community cluster matrix j S is a community vector of the jth node in the community cluster matrix ij For the similarity matrix of the node i and the node j, when S ij When the value is high, the node pair (i, j) is constrained in the same community, when S ij When the value is low, the node i and the node j are in different communities;
the joint optimization is performed according to the self-supervision training parameters and the MLP parameters, and the total loss consists of a weighted sum of two loss parts of training node embedding and community discovery, as shown in a formula (11),
minL total =αL ss +L comm (11)
in the formula (11), minL total To minimize the overall training loss, α is the weighting factor in the optimization process, L ss For self-supervision and contrast of the loss of training node embedding, L comm Is a loss of community discovery.
2. An apparatus for obtaining communities based on self-supervision training of a graph self-encoder, which is characterized in that: the method comprises a community obtaining module for obtaining a network graph G, and obtaining a first damage graph G through damage function processing 1 And a second damage graph G 2 The node embedded matrix Z after the pre-training is obtained through the pre-training treatment, and a first damage graph G 1 And a second damage graph G 2 Based on a standardized mutual information maximization principle, performing noise contrast function contrast learning, training based on a self-supervision training model until a loss function is minimized, obtaining an optimized node embedding matrix Z, performing self-expression principle and regularization processing to obtain a similarity matrix S of paired nodes, and guiding generation of node embedding moments by the similarity matrix S The matrix Z is used for obtaining a community cluster matrix C of the nodes based on the similarity matrix S and the fully-connected multi-layer perceptron;
g= (V, E, X), where v= {1,2, …, N } is a set of nodes,is a set of edges, assuming each node is in vector x i Including some attribute values x i ∈R F ,X=[x 1 ,x 2 ,…,x i ,…,x N ] T ∈R N×F Is a node attribute matrix of the network graph; the object is to learn the function f V-K]Wherein [ K ]]= {1,2, …, K } is an index set representing community clusters, each node is mapped to one community by the network structure and node attribute of the graph;
g is a quotation network diagram, V is a group of nodes which are papers, documents or web pages, E is a quotation relationship among the edges of the papers, the documents or the web pages;
in the formula (1), Z is a node embedding matrix, each behavior of Z is a vector representation of a node, and Z is E R |V|×F′ The method comprises the steps of carrying out a first treatment on the surface of the X is the attribute matrix of the node, A is the adjacency matrix of the input network graph,I∈R |V|×|V| is a unit matrix, D is a degree matrix, < >>Relu (·) is the activation function, W 0 、W 1 All are weight parameters of the picture scroll lamination; z is Z 1 Is a first damage graph G 1 Node embedded matrix of Z 2 Is a second damage graph G 2 Is embedded into the matrix, and parameter sharing is kept in the GCN encoder training process;
according to the generated node embedding matrix Z, learning to generate a similarity matrix S of paired nodes;
Using the self-expression principle of the node, and using the linear addition of other nodes to express the node i, such as a formula (3);
in the formula (3), Z i Is the embedded vector of node i, Z j Is the embedded vector of node j, p ij Is the similarity coefficient between node i and node j, defining p ii =0;
The reconstruction node embedding matrix Z is normalized by the F-norm, the objective function is optimized as in equation (4),
in the formula (4), Z is a node embedding matrix, P is a similarity coefficient matrix, and lambda 1 Is an optimized weight parameter;
the following is a process of constructing a similarity matrix S for the nodes, training data using matrix decomposition and batch processing; calculating coefficient matrix according to similarity coefficient P between nodesAs in equation (5),
because the data dimension in the data set is large, the data is randomly sampled by batch processing, and the coefficient matrix is used for facilitating calculation and storageSVD decomposition with rank r is performed, as in equation (6) and equation (7),
in the formula (6), r=4k+1, u is a left eigenvector matrix, Σ is a singular value diagonal matrix, V T Is a right eigenvector matrix;
regularizing each row of L, and setting a negative value in L as 0 to obtain L';
finally, constructing a similarity matrix S, such as formula (8), S ij ∈[0,1],
In the formula (8), S is a similarity matrix, L is a characteristic matrix generated after singular value decomposition of a coefficient matrix, L' is a normalized L matrix, and a negative element is set to be 0, L T Is the transpose of the L matrix, L Is an infinite norm of the L matrix;
randomly sampling and selecting M nodes, wherein M is less than or equal to N; the loss in batch training equation (4); in the step, the node similarity matrix S is generated, so that the obtained node embedding Z can be guided to generate a community C;
the community cluster matrix C comprises N rows of community vectors C i
C i =Softmax(MLP(Z i ))∈R K (9)
In the formula (9), C i For the community vector of the ith node in the community cluster matrix, softmax (·) is an activation function, MLP is a three-layer neural network, and MLP takes each node vector Z i Mapping to a K-dimensional vector, wherein K is the number of community clusters, and K is assumed to be known; the softmax layer converts the K-dimensional vector into a probability distribution such that C iK Namely C i Represents the probability that the ith node belongs to the kth cluster, such that nodes with similar embeddingsThe points will be mapped to similar locations in the (K-1) dimensional probability distribution;
in the step of obtaining the community cluster matrix C of the nodes, probability distribution of due connection and no connection relation of the nodes is obtained in an unsupervised mode so as to guide the formation of communities;
continuously updating node embedding to guide the generation of a community cluster matrix by training MLP parameters; the optimization objective of the community is found as formula (10),
In the formula (10), C i C is the community vector of the ith node in the community cluster matrix j S is a community vector of the jth node in the community cluster matrix ij For the similarity matrix of the node i and the node j, when S ij When the value is high, the node pair (i, j) is constrained in the same community, when S ij When the value is low, the node i and the node j are in different communities;
the joint optimization is performed according to the self-supervision training parameters and the MLP parameters, and the total loss consists of a weighted sum of two loss parts of training node embedding and community discovery, as shown in a formula (11),
minL total =αL ss +L comm (11)
in the formula (11), minL total To minimize the overall training loss, α is the weighting factor in the optimization process, L ss For self-supervision and contrast of the loss of training node embedding, L comm Is a loss of community discovery.
3. The apparatus for obtaining communities based on graph-based self-encoder self-supervised training of claim 2, wherein: the community acquisition module is further used for acquiring a community cluster matrix C of the nodes, wherein the community cluster matrix C is used for acquiring probability distribution of due connection and no connection relation of the nodes in an unsupervised mode so as to guide formation of communities.
4. An apparatus for obtaining communities based on graph self-encoder self-supervised training, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that: the processor, when executing the computer program, implements the corresponding steps of claim 1.
5. An apparatus for obtaining communities based on graph self-encoder self-supervised training includes a computer readable storage medium storing a computer program, characterized in that: which when executed by a processor carries out the respective steps of claim 1.
CN202310163573.5A 2023-02-24 2023-02-24 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training Active CN116304367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310163573.5A CN116304367B (en) 2023-02-24 2023-02-24 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310163573.5A CN116304367B (en) 2023-02-24 2023-02-24 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Publications (2)

Publication Number Publication Date
CN116304367A CN116304367A (en) 2023-06-23
CN116304367B true CN116304367B (en) 2023-12-01

Family

ID=86786197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310163573.5A Active CN116304367B (en) 2023-02-24 2023-02-24 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Country Status (1)

Country Link
CN (1) CN116304367B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950594A (en) * 2020-07-14 2020-11-17 北京大学 Unsupervised graph representation learning method and unsupervised graph representation learning device on large-scale attribute graph based on sub-graph sampling
CN111985623A (en) * 2020-08-28 2020-11-24 复旦大学 Attribute graph group discovery method based on maximized mutual information and graph neural network
CN112784118A (en) * 2021-01-07 2021-05-11 之江实验室 Community discovery method and device in graph sensitive to triangle structure
CN112966114A (en) * 2021-04-10 2021-06-15 北京工商大学 Document classification method and device based on symmetric graph convolutional neural network
CN113205175A (en) * 2021-04-12 2021-08-03 武汉大学 Multi-layer attribute network representation learning method based on mutual information maximization
CN113255895A (en) * 2021-06-07 2021-08-13 之江实验室 Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method
CN113268993A (en) * 2021-05-31 2021-08-17 之江实验室 Mutual information-based attribute heterogeneous information network unsupervised network representation learning method
CN113409159A (en) * 2021-06-24 2021-09-17 中国人民解放军陆军工程大学 Deep community discovery method fusing node attributes
CN114037014A (en) * 2021-11-08 2022-02-11 西北工业大学 Reference network clustering method based on graph self-encoder
CN114819325A (en) * 2022-04-20 2022-07-29 浙江师范大学 Score prediction method, system, device and storage medium based on graph neural network
WO2023010502A1 (en) * 2021-08-06 2023-02-09 Robert Bosch Gmbh Method and apparatus for anomaly detection on graph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11853903B2 (en) * 2017-09-28 2023-12-26 Siemens Aktiengesellschaft SGCNN: structural graph convolutional neural network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950594A (en) * 2020-07-14 2020-11-17 北京大学 Unsupervised graph representation learning method and unsupervised graph representation learning device on large-scale attribute graph based on sub-graph sampling
CN111985623A (en) * 2020-08-28 2020-11-24 复旦大学 Attribute graph group discovery method based on maximized mutual information and graph neural network
CN112784118A (en) * 2021-01-07 2021-05-11 之江实验室 Community discovery method and device in graph sensitive to triangle structure
CN112966114A (en) * 2021-04-10 2021-06-15 北京工商大学 Document classification method and device based on symmetric graph convolutional neural network
CN113205175A (en) * 2021-04-12 2021-08-03 武汉大学 Multi-layer attribute network representation learning method based on mutual information maximization
CN113268993A (en) * 2021-05-31 2021-08-17 之江实验室 Mutual information-based attribute heterogeneous information network unsupervised network representation learning method
CN113255895A (en) * 2021-06-07 2021-08-13 之江实验室 Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method
CN113409159A (en) * 2021-06-24 2021-09-17 中国人民解放军陆军工程大学 Deep community discovery method fusing node attributes
WO2023010502A1 (en) * 2021-08-06 2023-02-09 Robert Bosch Gmbh Method and apparatus for anomaly detection on graph
CN114037014A (en) * 2021-11-08 2022-02-11 西北工业大学 Reference network clustering method based on graph self-encoder
CN114819325A (en) * 2022-04-20 2022-07-29 浙江师范大学 Score prediction method, system, device and storage medium based on graph neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Zhou Xinchuang ; Su Lingtao ; Li Xiangju ; Zhao Zhongying ; Li Chao.Community detection based on unsupervised attributed network embedding.《Expert Systems With Applications》.2022,第213卷第3节. *
图卷积神经网络综述;徐冰冰;岑科廷;黄俊杰;沈华伟;程学旗;;计算机学报;第43卷(第05期);全文 *
图嵌入方法与应用:研究综述;祁志卫;王笳辉;岳昆;乔少杰;李劲;;电子学报(第04期);全文 *
基于链接模型的主动半监督社区发现方法;柴变芳;王建岭;许冀伟;李文斌;;计算机应用(第11期);全文 *
应用非负矩阵分解模型的社区发现方法综述;李亚芳;贾彩燕;于剑;;计算机科学与探索;第10卷(第01期);全文 *

Also Published As

Publication number Publication date
CN116304367A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Bresson et al. Residual gated graph convnets
CN111950594B (en) Unsupervised graph representation learning method and device on large-scale attribute graph based on sub-sampling
Zhang et al. Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification
Bose et al. Latent variable modelling with hyperbolic normalizing flows
CN111709518A (en) Method for enhancing network representation learning based on community perception and relationship attention
CN112307995B (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
Ju et al. A comprehensive survey on deep graph representation learning
CN109753589A (en) A kind of figure method for visualizing based on figure convolutional network
CN111291556A (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
Nguyen et al. Quaternion graph neural networks
CN110598022B (en) Image retrieval system and method based on robust deep hash network
Wang et al. Graph neural networks: Self-supervised learning
CN112417289A (en) Information intelligent recommendation method based on deep clustering
CN112256870A (en) Attribute network representation learning method based on self-adaptive random walk
Lin et al. Deep unsupervised hashing with latent semantic components
Li et al. Graphadapter: Tuning vision-language models with dual knowledge graph
Lee et al. Improved recurrent generative adversarial networks with regularization techniques and a controllable framework
CN111144500A (en) Differential privacy deep learning classification method based on analytic Gaussian mechanism
Fang et al. Contrastive multi-modal knowledge graph representation learning
CN108388918B (en) Data feature selection method with structure retention characteristics
Dolgikh Generative conceptual representations and semantic communications
CN112286996A (en) Node embedding method based on network link and node attribute information
CN116304367B (en) Algorithm and device for obtaining communities based on graph self-encoder self-supervision training
CN117093924A (en) Rotary machine variable working condition fault diagnosis method based on domain adaptation characteristics
CN115392474B (en) Local perception graph representation learning method based on iterative optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant