CN112784118A - Community discovery method and device in graph sensitive to triangle structure - Google Patents

Community discovery method and device in graph sensitive to triangle structure Download PDF

Info

Publication number
CN112784118A
CN112784118A CN202110018835.XA CN202110018835A CN112784118A CN 112784118 A CN112784118 A CN 112784118A CN 202110018835 A CN202110018835 A CN 202110018835A CN 112784118 A CN112784118 A CN 112784118A
Authority
CN
China
Prior art keywords
graph
node
information
encoder
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110018835.XA
Other languages
Chinese (zh)
Inventor
张吉
王佳麟
高军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Zhejiang Lab
Original Assignee
Peking University
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Zhejiang Lab filed Critical Peking University
Priority to CN202110018835.XA priority Critical patent/CN112784118A/en
Publication of CN112784118A publication Critical patent/CN112784118A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention relates to a community discovery method and device in a graph sensitive to a triangle structure. The method comprises the following steps: utilizing a graph encoder in a graph self-encoder to fuse structural information and node content information in a graph through a graph neural network model so as to learn hidden layer vector representation of nodes in the graph; reconstructing a continuous edge relation between two points in the graph and a triangular structure in the graph according to the hidden vector representation of the nodes in the graph by using a graph encoder in a graph self-encoder; and carrying out graph clustering by using the structure information and the node content information in the reconstructed graph, thereby discovering communities. The invention is an unsupervised graph self-encoder-based community discovery scheme sensitive to a triangle structure, can efficiently and adaptively realize community discovery tasks in a graph, is applied to different platforms, and has high expandability and high flexibility.

Description

Community discovery method and device in graph sensitive to triangle structure
Technical Field
The invention belongs to the technical field of general information, and a plurality of scenes and applications in real life can be described by using graphs, such as a social network graph, a paper introduction graph, a user commodity graph in an e-commerce platform and the like. The triangle structure in the graph has important significance for community composition and discovery. The method is based on the advanced graph neural network technology, combines the triangular structure in the graph, learns node representation from data in a self-supervision mode, and clusters the node representation so as to discover the community structure in the graph, and can be widely applied to graphs of different online network platforms such as social interaction, electronic commerce and the like.
Background
The graph structure is widely applied to the description of various complex scenes in the real world, such as a social relationship network, a world wide web, a city traffic network, a user commodity relationship network in e-commerce and the like. Community structure is a common feature in all types of graphs, the whole graph is composed of many communities, and the communities reflect the closeness of connection between nodes. The community discovery algorithm in the graph can help us to understand node clusters, independent groups and network structures in the graph, which helps us to infer similar behaviors and preferences of groups of peers, elastic estimation and search nesting relation, and can also provide basis for data mining tasks. In an e-commerce system, for example, querying for cheating groups that have a collaborative relationship with a given target cheating user; in a social network, a community of interest common to a single or multiple target users is queried, and so on.
The community discovery task on the graph is generally to obtain communities according to node clustering in the graph. Nodes in communities are closely related and the relations among communities are sparse, so that dense subgraphs are usually in communities, and triangles constitute basic elements of the dense subgraphs, so that the utilization of the triangle structure in the subgraphs is very important for the discovery of the communities. Traditional clustering algorithms, such as K-L dichotomy, graph dichotomy, spectral clustering, and the like, mainly use the connection information in the graph to find communities, and lack of utilization of the content information of the nodes in the graph. Some graph clustering algorithms based on deep learning try to fuse structural information and node content information of a graph in a model, learn to obtain vector representations of nodes for clustering, however, the models only pay attention to simple structural information, and lack of utilization of high-order structures (such as triangle structures) in the graph, so that community information cannot be better mined.
Aiming at the community problem in the graph, the method gives a model which considers the graph structure and the node content at the same time, and simultaneously has important academic value and wide application prospect by combining the model of the triangle structure in the graph.
Disclosure of Invention
The invention provides an unsupervised graph self-encoder-based community discovery method and device sensitive to a triangle structure. The method fuses structure information and content information in the graph by utilizing a graph neural network in an encoder, and learns high-order structure information in the graph by reconstructing a triangle structure in a decoder. In such a way, the method can realize the community discovery task in the graph in an efficient and self-adaptive manner, and is applied to different platforms.
The technical scheme adopted by the invention is as follows:
an unsupervised graph auto-encoder-based community discovery method sensitive to triangle structures, comprising the following steps:
utilizing a graph encoder in a graph self-encoder to fuse structural information and node content information in a graph through a graph neural network model so as to learn hidden layer vector representation of nodes in the graph;
reconstructing a continuous edge relation between two points in the graph and a triangular structure in the graph according to the hidden vector representation of the nodes in the graph by using a graph encoder in a graph self-encoder;
and carrying out graph clustering by using the structure information and the node content information in the reconstructed graph, thereby discovering communities.
The method is based on an auto-encoder structure and consists of a graph encoder and a graph decoder.
The graph encoder utilizes an advanced graph neural network model to fuse structural information and node content information in a graph, so as to learn hidden layer vector representation of nodes in the graph. The input of the graph encoder is a adjacency matrix of the graph and a node characteristic matrix, and the structural information of the graph and the content information of the nodes are fused through a graph neural network model, such as a graph convolution neural network/graph attention neural network and the like. Note that if the original image has only structure information and no node content information, the degree of the node may be used as the node content information. By means of the multi-layer graph neural network, hidden layer vector representations of the nodes can be obtained, and the hidden layer vector representations can be used for a subsequent decoder to decode and reconstruct according to existing information (such as structural content) in the graph.
The Graph decoder reconstructs the structure in the Graph according to the hidden vector representation of the nodes, and a traditional unsupervised Graph neural network (Graph auto encoder) usually only focuses on simple low-order structure information in a decoder part, such as the situation whether a connecting edge exists between two nodes. However, for the task of community detection, such information is insufficient, as described above, the community is usually a dense subgraph, and the important component of the dense subgraph is the triangle structure, so in the present invention, not only the connection relationship between two points in the graph is concerned, but also the reconstruction of the triangle structure in the graph is concerned. Specifically, for the reconstruction of the two-point continuous edge information in the graph, given two points A and B of the original continuous edge in the graph, the invention calculates the continuous edge possibility of the two points A and B through a layer of inner product network to reconstruct the existing continuous edge information in the graph. For the reconstruction of a triangular structure, a connecting edge between A and B is given, a neighbor set of A and B is searched, if C is a neighbor of A or B, the learning of triangular information is carried out according to whether C is connected with A and B (namely A, B and C form the triangular structure), meanwhile, negative sampling is carried out, sampling nodes D and D are not connected with A and B, and the reconstruction and the learning of the triangular information are carried out according to the relation among A, B, C and D.
The community detection method carries out graph clustering according to the node hidden layer vector representation learned by the graph self-encoder, such as a K-means algorithm, so as to find communities.
Furthermore, the method realizes the expandability of the algorithm, and as massive data is often processed in practical production application, the scale of the related graph may be very large (such as ten-million-level points and hundred million-edge graphs), and in order to ensure the expandability of the algorithm (feasibility of running on a large graph), the invention also provides a method for running the algorithm on the large graph. Firstly, a few theoretical guarantees are given, and since the learning process of the graph neural network model is to aggregate the structural and attribute features of the neighbors around the central node, and transmit the features after local conversion, the graph neural network model often has the characteristic of being "local", that is, the learning process of the graph neural network is limited to "receptive field" for each graph node, in other words, information at a far end (for example, the shortest distance between two graph nodes is more than 20) is useless for learning in the model. Also, for a central node, only the triangular structure near its perimeter is of greater value. Therefore, in order to improve the expandability of the algorithm, sub-graph sampling can be carried out on the original large graph, only neighbor nodes around the central node are reserved after sampling, then a graph self-encoder model is learned on the sampled sub-graph, and the operation efficiency and the expandability of the algorithm are guaranteed while the model learning effect is guaranteed.
Based on the same inventive concept, the invention also provides a community discovery device in a graph sensitive to a triangular structure, which adopts the method, and comprises the following steps:
the graph encoder module is used for fusing structure information and node content information in the graph through a graph neural network model so as to learn hidden layer vector representation of the nodes in the graph;
the decoder module is used for reconstructing a connection edge relation between two points in the graph and a triangle structure in the graph according to the hidden vector representation of the nodes in the graph;
and the clustering module is used for carrying out graph clustering by using the structure information and the node content information in the reconstructed graph so as to discover communities.
The invention relates to an unsupervised graph self-encoder-based community discovery method sensitive to a triangle structure, which has expandability and can be applied to graph data of different scales to discover mining information, and the generation of node vectors in a graph is realized under the condition of utilizing different dimension information in the graph. The invention has the advantages that:
1) the invention is an unsupervised learning model, does not need a data set with labels, and has high expandability and wider applicable scenes (pictures).
2) The method utilizes an advanced graph self-encoder to learn the vector representation of the graph nodes, wherein, in the graph encoder stage, a graph neural network is utilized to fuse the structure information of the graph and the content information of the nodes, and the original side information and triangle information in the graph are reconstructed in the decoder stage to learn the node representation more suitable for the community discovery algorithm in the graph.
3) The framework of the method has high flexibility, wherein the encoder can be replaced by different graph neural network models, such as a graph convolution model, a graph attention machine model and the like, the decoder can also design different loss functions according to different application scenes (for example, reconstruction of node attribute information can be increased for a rich attribute graph), and meanwhile, the sub-graph sampling part can also apply different sampling methods to carry out efficient sampling.
4) The node vector representation learned by the model in the invention contains rich information in the graph, can be used for community discovery, and tasks (such as node classification and link prediction) in other graphs, and can be applied to different applications in reality, such as community discovery with the same interest in a social network.
Drawings
FIG. 1 is a general framework and flow diagram of the process of the present invention. Wherein A is the adjacency matrix of the graph, X is the node attribute vector matrix in the graph, At is the adjacency matrix of the sampled subgraph, Xt is the node attribute vector matrix of the subgraph,
Figure BDA0002887975780000041
representing the continuous-edge relation reconstructed from the encoder,
Figure BDA0002887975780000042
representing the continuous-edge relationship of the triangles reconstructed from the encoder.
Fig. 2 shows three relations of the triangular structure.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
The patent learns the low-dimensional vector representation of the nodes in the graph by using the graph neural network under an unsupervised setting and then uses the low-dimensional vector representation for community mining in the graph. In order to improve the expandability of the algorithm, for a large graph, the method of sub-graph sampling is used for reducing the scale of training data, and the sub-graph sampling method can be distinguished according to the applied graph scene, for example, the graph nodes are simply sampled according to the degrees of the nodes, or the edges in the graph are sampled, and sub-graph extraction is carried out in the large graph according to the sampled nodes and edges. Specific sampling methods can be found in literature: zeng H, Zhou H, Srivastava a, Kannan R, Prasanna v. graphsaint: graph sampled induced left method. arXiv preprinting arXiv: 1907.04931.2019 Jul 10.
The overall framework of the process of the invention is shown in FIG. 1. Given a graph G ═ V, E, X, where V represents the set of nodes in the graph, E represents the set of edges in the graph, a represents the adjacency matrix of G, for an attribute graph, X represents the node attribute vector matrix in the graph, and for a non-attribute graph, we can initialize X with the node degree. The method firstly utilizes a sub-graph sampling method to perform sub-graph sampling on G to generate a plurality of sub-graphs, and then learning of a graph self-encoder is performed on each sub-graph. The graph self-encoder is divided into two modules, a graph encoder in front of it and a graph decoder behind it. The graph encoder can be a common graph neural network model, such as a graph convolution neural network, a graph attention neural network and the like, and the graph encoder is divided into two parts, wherein one part is the relationship of edges in the reconstructed graph, and the other part is the relationship of triangle structures in the reconstructed graph.
A detailed description of each block of the sub-picture sample and picture self-encoder is given below:
1. sub-graph sampling
For a sub-graph sampling method, the invention designs a simple and efficient mode for sampling edges in a graph, and meanwhile, documents theoretically prove the effectiveness of the sub-graph sampling method, namely the sampling method can ensure that the variance of a model is reduced under different iteration rounds when the model is calculated on the sub-graph after sampling, and the specific reference can be found in the documents: chen J, Ma T, Xiao c. fastgcn: fast Learning with Graph relational network via impedance sampling. In International Conference on Learning retrieval 2018 Feb 15.
The sampling strategy of the present invention is here related to the degree of the nodes in the graph, for node u and node v, if there is a connecting edge between them, the probability that the connecting edge is sampled is pu,vOc 1/du + 1/dv. Wherein, oc represents a direct ratio.
And giving the sampling subgraph scale, such as how many edges are sampled, then sampling the edges in the whole graph according to the probability, and then performing subgraph extraction according to the edges obtained by sampling so as to determine the sampling subgraph.
2. Self-coding device for picture
After obtaining a plurality of subgraphs, training a neural network of a graph self-encoder on each subgraph, wherein the graph self-encoder is divided into two modules of an encoder and a decoder, and the decoder is subdivided into two parts of an edge structure loss reconstruction decoder in the graph and a triangle loss reconstruction decoder in the graph. Specific implementation forms thereof are given below respectively. Suppose the subgraph is Gt={At,XtIn which A istIs a contiguous matrix of subgraphs, XtIs a node attribute vector matrix of the subgraph.
2.1) Picture encoder
The graph encoder in the patent utilizes a graph neural network to encode structure information and node content information of a graph, and can utilize various forms of graph neural networks such as a graph convolution neural network and a graph attention machine to manufacture the neural network. A representation of a graph convolution neural network is given here.
Figure BDA0002887975780000051
Where l represents the l-th layer of the network,
Figure BDA0002887975780000052
representing a node vector representation of the l-th layer in the network. Wherein
Figure BDA0002887975780000053
Wherein
Figure BDA0002887975780000054
I is and AtThe identity matrix is of the same size as the other identity matrix,
Figure BDA0002887975780000055
is that
Figure BDA0002887975780000056
Degree matrix of (W){l}Is a trainable parameter in the l-th layer, σ is an activation function in the network, usually set to RELU. After the L-layer network is processed, the hidden layer node vector representation in the network is obtained
Figure BDA0002887975780000057
2.2) diagram encoder
The graph decoder of the patent reconstructs side information and triangle structures in the graph. The reconstruction of the connecting edges can reflect the basic local structure information of the graph, and the reconstruction of the triangles can better learn the high-order community information in the graph. Through the fusion learning of the two, the algorithm can learn richer information in the graph.
a) Information reconstruction of edges in a graph
Where the side information reconstruction in the graph relies on the following formula:
Figure BDA0002887975780000061
Figure BDA0002887975780000062
wherein, given an edge, the edge contains node u and node v, zuHidden layer node vector, z, representing point uvA hidden node vector representing point v.
Hidden layer node vector representation Z learned according to modeltPerforming inner product operation, and reconstructing all connected edges in the subgraph to obtain a reconstructed adjacent matrix
Figure BDA0002887975780000063
Figure BDA0002887975780000064
Representing the reconstructed edge-connecting relation, the elements of the corresponding positions in the matrix are represented by Lu,vForming, then, from the reconstructed sub-graph adjacency matrix
Figure BDA0002887975780000065
And the real subgraph adjacency matrix AtThe difference to define the loss function. Specifically, for edges (u, v) present in the graph, the computed loss function is Lu,v
b) Information reconstruction of triangles in a graph
The triangle structure is an important part for forming a dense subgraph (community), is very helpful for mining high-order information in the graph, and in order to better utilize the triangle information, a decoder module in the graph carries out display reconstruction on a triangle module in the graph. A triangle is composed of three points, and given an edge, the relationships between the three points can be summarized as the three given in fig. 2 (excluding the case where none of the three points are connected to each other, as this case is already included in the reconstruction of information of the edge in the figure). In case 1, three points are connected with each other, the three points are connected most closely, in case 2, two connected edges are arranged, the three points are connected relatively closely, in case 3, one connected edge is arranged, and the three points are connected most loosely. In order for the model to learn three different relationships, the following loss functions were designed.
Firstly, mutual information represented by two vectors is defined to reflect the correlation of the two vectors:
D(e1,e2)=σ(e1 TWd e2)
where σ is a logistic sigmoid activation function and the training parameter W of the networkd∈RF*F′Wherein F is e1Is F' is e2The vector dimension of (2).
Firstly, an edge (u, v) in the subgraph is given, all neighbor nodes R of the u and the v are found, and R belongs to Nu∪NvIn which N isuThe set of neighbor nodes representing u. The relationship of R and (u, v) then belongs to case 1 or case 2. To learn this difference, our loss function is designed as follows:
Figure BDA0002887975780000066
wherein z isu,vDenotes zuAnd zvConnected representation of hidden layer vectors of, ziThe hidden layer vector representing the inode, i belongs to the node in R that satisfies case 1, and j is the node in R that satisfies case 2.
For the difference between case 2 and case 3, given the edges (u, v), the loss function is designed as follows:
Figure BDA0002887975780000071
the loss function for reconstructing the triangle is as follows:
Lt=αL1+L2
wherein alpha can be used to regulate the ratio of the two.
The algorithm training and predicting stage in the whole invention is as follows:
in the training stage, a plurality of subgraphs are sampled, then on each subgraph, the loss of the neural network is calculated according to the edge reconstruction loss function in the subgraph and the reconstruction loss function of the triangular structure in the subgraph, and then the parameters of the trained neural network are obtained after the neural network is trained through back propagation gradient descent. In the guessing stage, inputting an adjacency matrix and a node vector matrix of the whole graph, then calculating a node hidden layer vector matrix through an encoder in a graph self-encoder with trained parameters to obtain a final low-dimensional vector representation representing the nodes in learning, and mining community information in the graph by performing K-means clustering on the node low-dimensional vector representations, wherein the K-means clustering method can refer to the following documents: kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY.an effect k-means marketing Algorithm: IEEE transactions on pattern Analysis and machine interaction.2002 Aug 7; 24(7): 881-92.
To test the effectiveness of the method, experiments were performed on three published graph datasets, including Cora, Citeseer and Pubmed. They are all paper reference datasets, where nodes are papers, edges are paper reference relationships, 2708 points, 5429 edges, Citeseer 3327 points, 4732 edges, pubed 19717 points, 44338 edges in the Cora dataset. Evaluation indexes in the experiment are Normalized Mutual Information (NMI), community classification accuracy and the like, and the indexes are often used in a community detection algorithm. Through analysis of experimental results, the effect of the Graph self-encoder sensitive to the triangle structure is better than that of a Graph self-encoder (Graph auto encoder) only reconstructing side information in a Graph, and is averagely higher than 3 percentage points.
Based on the same inventive concept, another embodiment of the present invention provides a community discovery apparatus in a graph sensitive to a triangle structure, which employs the above method, including:
the graph encoder module is used for fusing structure information and node content information in the graph through a graph neural network model so as to learn hidden layer vector representation of the nodes in the graph;
the decoder module is used for reconstructing a connection edge relation between two points in the graph and a triangle structure in the graph according to the hidden vector representation of the nodes in the graph;
and the clustering module is used for carrying out graph clustering by using the structure information and the node content information in the reconstructed graph so as to discover communities.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
With the advent of the information age, the data size on the internet is increased in a large amount, and the relationship between data is more and more complex, so that the characteristics of the data and the relationship between the data can be better described and expressed by using the data structure of the graph. Meanwhile, in the data mining task of the graph, the community information can be analyzed to help people to better understand the information in the graph from a macroscopic view, the application is very wide, and the cost of manually marking mass data is huge, so that the community mining of the graph under the unsupervised setting is very important. In particular, the triangle structure is an important component of the community, and the learning of community information in the graph should be fully utilized. The method carries out graph representation learning tasks by combining with a deep learning technology graph neural network which is widely researched and used in recent years, improves the expandability of an algorithm by a sub-graph sampling strategy, obtains node vector representation by fusing graph structure information and attribute information of nodes in a graph in an encoder, and captures high-order information in the graph by focusing on the reconstruction of a triangle structure in a decoder. For use by downstream graph community information mining tasks.
The foregoing disclosure of the specific embodiments of the present invention and the accompanying drawings is directed to an understanding of the present invention and its implementation, and it will be appreciated by those skilled in the art that various alternatives, modifications, and variations may be made without departing from the spirit and scope of the invention. The present invention should not be limited to the disclosure of the embodiments and drawings in the specification, and the scope of the present invention is defined by the scope of the claims.

Claims (10)

1. A community discovery method in a graph sensitive to a triangle structure is characterized by comprising the following steps:
utilizing a graph encoder in a graph self-encoder to fuse structural information and node content information in a graph through a graph neural network model so as to learn hidden layer vector representation of nodes in the graph;
reconstructing a continuous edge relation between two points in the graph and a triangular structure in the graph according to the hidden vector representation of the nodes in the graph by using a graph encoder in a graph self-encoder;
and carrying out graph clustering by using the structure information and the node content information in the reconstructed graph, thereby discovering communities.
2. The method of claim 1, wherein sub-graph sampling is used to reduce the training data size, and then a graph self-encoder is learned on the sampled sub-graphs.
3. The method of claim 2, wherein the sub-picture sampling comprises:
for node u and node v, if there is a continuous edge between them, the probability that the continuous edge is sampled is pu,v∝1/du+1/dv;
And giving the scale of the sampling subgraph, sampling the edges in the whole graph according to the probability, and performing subgraph extraction according to the edges obtained by sampling so as to determine the sampling subgraph.
4. The method of claim 1, wherein the edge-to-edge relationship between two points in the graph is reconstructed by: two points A and B of the original continuous edge in the graph are given, and the continuous edge possibility of the two points A and B is calculated through a layer of inner product network, so that the existing continuous edge information in the graph is reconstructed.
5. The method of claim 4, wherein the loss function for reconstructing the edge-to-edge relationship between two points in the graph is calculated by:
representing Z from learned hidden node vectorstPerforming inner product operation, reconstructing all connected edges in the subgraph to obtain a reconstructed subgraph adjacent matrix
Figure FDA0002887975770000011
Figure FDA0002887975770000012
For edges (u, v) present in the graph, according to
Figure FDA0002887975770000013
And the real subgraph adjacency matrix AtTo define a loss function Lu,v
Figure FDA0002887975770000014
Wherein z isuHidden layer node vector, z, representing point uvA hidden node vector representing point v.
6. The method of claim 1, wherein the triangular structure in the graph is reconstructed by: and (3) giving a connecting edge between the A and the B, searching a neighbor set of the A and the B, assuming that the C is a neighbor of the A or the B, learning the triangle information according to whether the C is connected with the A and the B, simultaneously carrying out negative sampling, sampling nodes D, D are not connected with the A and the B, and reconstructing and learning the triangle information according to the relation among the A, the B, the C and the D.
7. The method of claim 6, wherein the loss function for reconstructing the triangular structure in the graph is calculated by:
given an edge, the relationship between three points is summarized into three cases, where three points all have a connecting edge to each other in case 1, two connecting edges in case 2, and one connecting edge in case 3;
mutual information represented by two vectors is defined to reflect the correlation of the two vectors: d (e)1,e2)=σ(e1 TWd e2) Where σ is a logistic sigmoid activation function, the training parameter W of the networkd∈RF*F′F is a vector e1F' is the vector e2Dimension (d);
given an edge (u, v) in the subgraph, find all the neighbor nodes R of u and v, R ∈ Nu∪NvIn which N isuRepresenting the neighbor node set of u, the relationship of R and (u, v) belongs to case 1 or case 2; for the difference between case 1 and case 2, the loss function is:
Figure FDA0002887975770000021
wherein z isu,vDenotes zuAnd zvConnected representation of hidden layer vectors of, ziRepresenting the hidden layer vector representation of the node i, i belongs to the node in R which accords with the condition 1, and j is the node in R which accords with the condition 2;
for the difference between case 2 and case 3, given the edges (u, v), the loss function is:
Figure FDA0002887975770000022
the loss function for the reconstructed triangle is then:
Lt=αL1+L2
wherein alpha is used to regulate L1、L2The ratio of the two.
8. A community discovery device in a graph sensitive to a triangle structure by using the method of any one of claims 1 to 7, comprising:
the graph encoder module is used for fusing structure information and node content information in the graph through a graph neural network model so as to learn hidden layer vector representation of the nodes in the graph;
the decoder module is used for reconstructing a connection edge relation between two points in the graph and a triangle structure in the graph according to the hidden vector representation of the nodes in the graph;
and the clustering module is used for carrying out graph clustering by using the structure information and the node content information in the reconstructed graph so as to discover communities.
9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 7.
CN202110018835.XA 2021-01-07 2021-01-07 Community discovery method and device in graph sensitive to triangle structure Pending CN112784118A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110018835.XA CN112784118A (en) 2021-01-07 2021-01-07 Community discovery method and device in graph sensitive to triangle structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110018835.XA CN112784118A (en) 2021-01-07 2021-01-07 Community discovery method and device in graph sensitive to triangle structure

Publications (1)

Publication Number Publication Date
CN112784118A true CN112784118A (en) 2021-05-11

Family

ID=75756040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110018835.XA Pending CN112784118A (en) 2021-01-07 2021-01-07 Community discovery method and device in graph sensitive to triangle structure

Country Status (1)

Country Link
CN (1) CN112784118A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486934A (en) * 2021-06-22 2021-10-08 河北工业大学 Attribute graph deep clustering method of hierarchical graph convolution network based on attention mechanism
CN114841296A (en) * 2022-07-04 2022-08-02 北京六方云信息技术有限公司 Device clustering method, terminal device and storage medium
CN116304367A (en) * 2023-02-24 2023-06-23 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486934A (en) * 2021-06-22 2021-10-08 河北工业大学 Attribute graph deep clustering method of hierarchical graph convolution network based on attention mechanism
CN114841296A (en) * 2022-07-04 2022-08-02 北京六方云信息技术有限公司 Device clustering method, terminal device and storage medium
CN116304367A (en) * 2023-02-24 2023-06-23 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training
CN116304367B (en) * 2023-02-24 2023-12-01 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Similar Documents

Publication Publication Date Title
Zhang et al. Star-gcn: Stacked and reconstructed graph convolutional networks for recommender systems
Fan et al. Graph neural networks for social recommendation
CN111950594B (en) Unsupervised graph representation learning method and device on large-scale attribute graph based on sub-sampling
Liu et al. Contextualized graph attention network for recommendation with item knowledge graph
CN110263280B (en) Multi-view-based dynamic link prediction depth model and application
Qu et al. An end-to-end neighborhood-based interaction model for knowledge-enhanced recommendation
Xie et al. BaGFN: broad attentive graph fusion network for high-order feature interactions
Yang et al. From properties to links: Deep network embedding on incomplete graphs
CN112784118A (en) Community discovery method and device in graph sensitive to triangle structure
Zhu et al. A survey on graph structure learning: Progress and opportunities
Liu et al. Hierarchical multi-view context modelling for 3D object classification and retrieval
Guo et al. Trust-aware recommendation based on heterogeneous multi-relational graphs fusion
Arya et al. Exploiting relational information in social networks using geometric deep learning on hypergraphs
CN113255895B (en) Structure diagram alignment method and multi-diagram joint data mining method based on diagram neural network representation learning
Liu et al. Multilayer graph contrastive clustering network
Liu et al. Fast attributed multiplex heterogeneous network embedding
Wang et al. Sparse graph based self-supervised hashing for scalable image retrieval
CN109740106A (en) Large-scale network betweenness approximation method based on graph convolution neural network, storage device and storage medium
CN115221413B (en) Sequence recommendation method and system based on interactive graph attention network
Lei et al. Social diffusion analysis with common-interest model for image annotation
Wang et al. Factor graph model based user profile matching across social networks
Wang et al. Efficient multi-modal hypergraph learning for social image classification with complex label correlations
Chen et al. Heterogeneous graph convolutional network with local influence
Ma et al. DBRec: Dual-bridging recommendation via discovering latent groups
Zhang et al. HG-Meta: Graph meta-learning over heterogeneous graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination