CN115994560A

CN115994560A - Graph neural network method based on multi-scale graph comparison learning

Info

Publication number: CN115994560A
Application number: CN202310135024.7A
Authority: CN
Inventors: 王波; 刘彦北; 李志胜; 徐振宇; 国英龙; 王伟; 贾智洋; 杨铭锴
Original assignee: Siteng Heli Tianjin Technology Co ltd
Current assignee: Siteng Heli Tianjin Technology Co ltd
Priority date: 2023-02-20
Filing date: 2023-02-20
Publication date: 2023-04-21

Abstract

The invention discloses a graph neural network method based on multi-scale graph comparison learning, which comprises the following steps: collecting original graph data, marking, enhancing the data, controlling the scale of a generated sub graph, further generating global views and local views with different scales, learning global information potential representations and local information potential representations of the original graph data, respectively executing different contrast learning strategies for the global information potential representations and the local information potential representations learned by the graph neural network, improving the distances of different views in a characterization space by minimizing an objective function, and improving the quality of the characterization learned by the network. The graph neural network method based on multi-scale graph comparison learning can obtain good graph level representation by learning only by using self-supervision signals extracted by data per se under the condition of not depending on data tag information, and can be widely applied to the technical fields of chemical molecular property prediction, biological protein function prediction, social network analysis and the like.

Description

Graph neural network method based on multi-scale graph comparison learning

Technical Field

The invention relates to a neural network technology, in particular to a graph neural network method based on multi-scale graph comparison learning.

Background

The graph structure data has been proved to be capable of effectively representing various data, such as social networks, financial networks, chemical molecular graphs and the like, and has wide application value. Along with the increasing popularity of graph data in real-world scenarios, learning the representation of graph data is becoming increasingly important.

Graph representation learning, i.e., extracting high-dimensional structure and attribute information from a graph and encoding it into a low-dimensional representation vector, has been widely used in many graph data analysis and processing tasks. In the information age, the increase in the amount of data has led to the increasing scarcity of marking data, resulting in the need for significant human resources and time to collect data tags. As a result, more and more researchers are beginning to focus on unsupervised or self-supervised graph representation learning.

Conventional methods such as matrix decomposition and random walk cannot be extended to larger-scale graphs and cannot make good use of the attribute information of the nodes, so learning the representation of the graph without relying on the label information of the graph becomes critical.

At present, graphs based on contrast learning represent a feasible method, and many related researches have also achieved promising results. However, most studies do not take into account multi-scale information of the graph data, i.e., contrast learning from a single scale, ignoring more global information or finer local information.

Disclosure of Invention

In order to solve the problems, the invention provides a graph neural network method based on multi-scale graph contrast learning, which overcomes the defect of multi-scale information in the prior method, proposes concepts of global views and local views by controlling the number of random walk nodes, and simultaneously formulates different contrast learning strategies, so that different strategies are executed among different views, rich multi-scale information in graph data can be mined, and different contrast learning strategies are executed, thereby effectively improving the quality of learned graph representation.

In order to achieve the above purpose, the present invention provides a graph neural network method based on multi-scale graph contrast learning, comprising the following steps:

s1, collecting original image data, and marking the collected original image data;

s2, carrying out data enhancement on the collected original image data in a random walk mode, controlling the scale of a generated sub image, and further generating global views and local views with different scales;

s3, learning global information potential representation and local information potential representation of the original graph data through a graph neural network;

s4, respectively executing different comparison learning strategies aiming at the global information potential representation and the local information potential representation learned by the graph neural network;

s5, by minimizing an objective function, the distances of different views in the characterization space are improved, and the quality of the characterization learned by the network is improved.

Preferably, in step S1, the original graph structure data collected from the network is marked as a corresponding tag file according to different data types.

Preferably, in step S1, a graph dataset D is defined, which dataset is made up of N graph data

Composition, g= [ V, E }, for each graph in the dataset, where +.>

Representing a set of nodes in the graph, E representing a set of edges in the graph, v _i Representing the ith node in the graph, if node v _i And v _j And there is a correlation between them, then e _ij ＝(v _i ，v _j ) E is the edge in the graph;

thus, the graph data has an adjacency matrix A with dimensions V x V, if e _ij E is A _ij ＝1；

In addition, each node of the graph data also has a feature vector x _i ∈R ^d All node features in the graph form a feature matrix X epsilon R ^|V|×d ，

Is the label to which the graph data corresponds.

Preferably, in step S2, a G ε { G _n : n.epsilon.N }, define an enhancement graph for the graph

Wherein->

Enhancing the method for the used graph; the enhancement method mainly comprises sub-sampling, and controlling the scale of the generated sub-image by controlling the number of nodes, so as to obtain a global view and a local view with different scale information, namely +.>

And

preferably, in step S3, the potential representation of the view is learned by using a five-layer map isomorphic network for sharing model parameters between the global view and the local view.

Preferably, in step S4, a regression metric is used for selecting the noise contrast estimation function between global views and between global and local views, and for selecting between local views, the regressor is implemented as a multi-layer perceptron with batch normalization and RELU activation functions.

Preferably, in step S4, the graph neural network mainly follows a message passing mechanism, and each node acquires attribute and structure information of a neighboring node through the message passing mechanism, so as to update its own node representation, and through k-layer iteration, the node captures information of its k-hop neighbor, namely:

AGGREGATE (-) and COMBINE (-) are respectively information of aggregation neighborhood nodes and update own node information; the node potential representation of the global and local views after data enhancement is obtained through a graph isomorphic neural network;

the potential representation of the entire graph is then obtained by pooling, i.e.:

READOUT (·) is graph pooling, a sum pooling mechanism;

finally, the representation of the acquired image level is obtained through a nonlinear transformation, namely:

z＝g(f(G))

g (·) is a nonlinear transformation, which is a two-layer perceptron with a RELU activation function;

thereby obtaining global and local representations of the graph data, i.e

And->

Preferably, in step S5, the neural network is optimized by assigning different weight coefficients to three different loss items, so that it can learn good graph level characterization, and thus be used for different downstream tasks.

Preferably, the step S5 specifically includes the following steps:

s51, simultaneously considering global representation and local representation, implementing different contrast learning strategies on the global representation and the local representation, further considering different scale information to improve performance, and defining noise contrast estimation loss

Is l _s The method comprises the following steps:

τ is the temperature coefficient, z ⁺ And z ^- Positive and negative samples, respectively;

s52, considering maximizing the global representation of the same original graph, minimizing the similarity of the global representations of different original graphs, and losing the function l _gg Is defined as follows:

wherein N is the number of samples in the batch;

s53, in order to establish local and global association, defining a loss function, namely:

s54, providing local and global similarity by a measurement mode with a learnable parameter, wherein the local and global similarity is a five-layer perceptron f with RELU activation function _θ It is desirable that the local views from the same graph have a higher similarity than the local views from different graphs, and therefore a loss function ψ is defined, namely:

s55, by maximizing ψ (θ _d ) Training the regressor and regarding its output as its similarity measure by Sigmoid activation function, and training

Considered as an estimator i for estimating local view similarity _d Thus, for between partial views, the partial view loss function is defined as:

s56, defining an overall loss function as follows:

s57, minimizing total loss by means of Adam gradient descent algorithm

To update the encoder parameters and use the pre-trained encoder for different downstream tasks.

Compared with the prior art, the invention has the following beneficial effects:

1. the multi-scale information in graph comparison learning is considered, different comparison learning strategies are executed for different scale information, the learned characterization quality is better, and the method can be better used for a series of downstream tasks.

2. The method can be used for scenes such as graph data analysis and graph representation learning, and can assist people in better analyzing and using the graph data.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a schematic diagram of a frame of the present invention;

FIG. 2 is a schematic of an algorithm of the present invention;

FIG. 3 is a regressor design of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that, while the present embodiment provides a detailed implementation and a specific operation process on the premise of the present technical solution, the protection scope of the present invention is not limited to the present embodiment.

FIG. 1 is a schematic diagram of a frame of the present invention; FIG. 2 is a schematic of an algorithm of the present invention; fig. 3 is a regressor design diagram of the present invention, as shown in fig. 1-3, a graph neural network method based on multi-scale graph contrast learning, comprising the steps of:

Composition, g= { V, E }, for each graph in the dataset, where +.>

Representing a set of nodes in the graph, E representing a set of edges in the graph, v _i Representative graphIf node v _i And v _j And there is a correlation between them, then e _ij ＝(v _i ，v _j ) E is the edge in the graph;

Is the label to which the graph data corresponds.

S2, graph data enhancement is important to the contrast learning employed herein, and the lack of data enhancement results in a model that is less effective than an untrained model. The data enhancement aims at converting the data to a certain degree under the condition that the semantic information of the original data is not affected as much as possible, and creating some novel data.

In this embodiment, the collected original graph data is enhanced by a random walk mode, and the scale of the generated subgraph is controlled, so that global views and local views with different scales are generated;

Wherein->

And

s3, learning global information potential representation and local information potential representation of original image data through the image neural network because the image neural network is an image data analysis method with strong expression capability;

it should be noted that, in the contrast learning stage, since the global view contains most of the information of the graph data and has high semantic similarity, it is desirable to reduce the distance between them in the characterization space; due to the large size of global views, which largely contain the content of local views, it is also desirable to reduce their distance in the characterization space; however, for partial views, the semantic similarity is low because the descriptions are always different, and instead, the distances of the partial views in the characterization space need to be pulled, namely the dissimilarity of the partial views is encouraged.

READOUT (·) is graph pooling, a sum pooling mechanism;

z＝g(f(G))

thereby obtaining global and local representations of the graph data, i.e

And->

Preferably, the step S5 specifically includes the following steps:

s51, simultaneously considering global representation and local representation, implementing different contrast learning strategies on the global representation and the local representation, further considering different scale information to improve performance, and defining noise contrast estimation lossLoss of function

Is l _s The method comprises the following steps:

s52, since the global view generally contains most of the contents of the graph, the global view has very similar semantic information to the previous one, so that the global representation of the same original graph is considered to be maximized, the similarity of the global representations of different original graphs is minimized, and the loss function l _gg Is defined as follows:

wherein N is the number of samples in the batch;

s53, because the global view has a larger size and contains the content of the local view to a large extent, a part of semantic content can be shared between the global view and the content, so that in order to establish local and global association, a loss function is defined, namely:

because the partial views from the same original graph generally describe different content, the semantic similarity between them is low, so that their potential representation distance is no longer reduced as before, but their dissimilarity is encouraged. However, given that there is still some semantic similarity between them, their distance is no longer directly pulled away by the noise contrast estimation penalty, so local and global similarity is given by a metric with a learnable parameter, which is a five-layer with RELU activation functionPerceptron f _θ It is desirable that the local views from the same graph have a higher similarity than the local views from different graphs, and therefore a loss function ψ is defined, namely:

s56, defining an overall loss function as follows:

s57, minimizing total loss by means of Adam gradient descent algorithm

Therefore, the graph neural network method based on the multi-scale graph comparison learning can obtain good graph level representation by learning only by using the self-supervision signals extracted by the data under the condition of not depending on the data tag information, and can be widely applied to the technical fields of chemical molecular property prediction, biological protein function prediction, social network analysis and the like.

Experimental example

Table 1 is a table of statistics for graph number, node number, and edge number for a dataset

Datasets	Category	Graph	Node	Edges
					MUTAG	Molecules	188	17.93	19.79
NCI1	Molecules	4110	29.87	32.30
					PROTEINS	Molecules	1113	39.06	72.82
DD	Molecules	1178	284.32	715.66
					IMDB-B	Social Network	1000	19.77	95.63
COLLAB	Social Network	5000	74.49	2457.78
					RDT-B	Social Network	2000	429.63	497.75
RDT-M5K	Social Network	5000	508.52	594.87

As can be seen from table 1, the TUDataset series data set is referenced for the experiment, including four chemical molecule data sets and four social network data sets. Where NCI1 is a compound dataset provided by the united states cancer institute (NCI), comprising 4100 graph samples, node feature dimension 37.DD is a data set containing 1178 protein structures, each protein represented by a graph, each node representing an amino acid. The MUTAG dataset consisted of 188 compounds, two classes were categorized according to their mutagenic effect on bacteria. The graph data nodes in the PROTEINS are two-level element structures, and if two nodes are adjacent in amino acid sequence or three-dimensional space, there is an edge between them. Each graph in the REDDIT-BINARY dataset corresponds to an online discussion thread, wherein nodes correspond to users, and edges exist when mutual responses exist between the two nodes. REDDIT-M5K expands the data collection range to five different sub communities based on REDDIT-BINARY. COLLAB is a scientifically collaborative dataset, data from 3 areas, namely high energy physics, condensed physics and celestial physics. IMDB-BINARY is a collaborative dataset of movies, with nodes representing actors for each graph, there being one edge between two actors if they appear in the same movie. Since there is no official division of the data set described above, a random division is employed and 10-fold cross-validation is performed.

The method of the invention is compared with the following method:

WL: combining the WL algorithm with the graph kernel, providing a WL graph kernel method, decomposing the graph into subtrees, and measuring the subtree similarity to obtain the graph similarity;

DGK: decomposing the graph into substructures, and measuring graph similarity through the substructures;

node2vec: depth-first and breadth-first algorithms are considered based on variants of the deep walk algorithm;

sub2vec: learning the sub-graph feature representation by performing a random walk method of sub-graph truncation;

graph2vec: the doc2vec in natural language processing is expanded into graph data, and unsupervised representation learning is performed by generating rooted subgraphs and negative samples.

GAE: the encoder is trained by encoding the graph into a potential feature representation and then restoring it to the original graph.

DGI: the concept of mutual information maximization is introduced into the graph data, so that the mutual information maximization of the nodes and the graph is realized.

ContexPred: the encoder model is trained by maximizing the similarity of the representation of the center node to the representation of the context nodes.

InfoGraph: similar to DGI, the idea is maximized based on mutual information as well, but it only focuses on the representation learning at the graph level.

GraphCL: four different data enhancement methods are proposed and contrast learning is used to optimize the consistency of the enhanced view.

JOAO: the data enhancement scheme can be selected adaptively and dynamically on the basis of the GraphCL.

SimgRACE: disruption of semantic information by data enhancement is avoided by directly perturbing the encoder.

In the experimental example, two tasks are performed to verify the effectiveness of the proposed method, and the two tasks are classified into an unsupervised classification task and a semi-supervised classification task. For unsupervised classification, the encoder is pre-trained using proposed multi-scale contrast learning, and then the representation of the output of the encoder after training is fed into a downstream linear SVM classifier. For semi-supervised classification, a multi-layer perceptron that delivers a representation downstream on an unsupervised basis is used to classify and give a proportion of label information to fine tune the model. Acc was used as an evaluation index in both tasks.

Table 2 is an unsupervised classification experiment result table

Table 2 shows the performance of the method of the present invention in eight downstream task data sets, and as can be seen from Table 2, the classification performance of the proposed Method (MSSGCL) is best, and is superior to all other baseline models, and the average Acc improvement achieved by the present invention can reach more than 2% compared with GraphCL encouraging small-size view similarity.

Table 3 shows the results of semi-supervised classification experiments

Table 3 shows the performance of the inventive method for eight data sets under semi-supervised settings, reporting subtasks with label rates of 1% and 10% as seen in table 3. At a label rate of 1% set, the proposed method is higher than all baseline models, even with improvement compared to the previous best model SimGRACE. When the label rate is 10%, the method is superior to the previous limit model, the optimal performance is realized in 6 of 7 data sets, and compared with GraphCL, the average Acc improvement is 2%.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. A graph neural network method based on multi-scale graph contrast learning is characterized in that: the method comprises the following steps:

2. The graph neural network method based on multi-scale graph contrast learning of claim 1, wherein the method comprises the following steps: in step S1, the original graph structure data collected from the network is marked as a corresponding tag file according to different data types.

3. The graph neural network method based on multi-scale graph contrast learning according to claim 2, wherein the method is characterized in that: in step S1, a graph dataset D is defined, which dataset is composed of N graph data

Composition, g= { V, E }, for each graph in the dataset, where +.>

In addition, each node of the graph data also has a feature vector x _i ∈R ^d All node features in the graph form a feature matrix X epsilon R ^|V|×d ，y _k E y is the label to which the graph data corresponds.

4. The graph neural network method based on multi-scale graph contrast learning of claim 1, wherein the method comprises the following steps: in step S2, a G ε { G }, is given _n : n.epsilon.N }, define an enhancement graph for the graph

Wherein->

And->

5. The graph neural network method based on multi-scale graph contrast learning of claim 1, wherein the method comprises the following steps: in step S3, the potential representation of the view is learned by using a five-layer map isomorphic network for sharing model parameters between the global view and the local view.

6. The graph neural network method based on multi-scale graph contrast learning of claim 1, wherein the method comprises the following steps: in step S4, a regressor metric is used for selecting the noise contrast estimation function between the global views and between the global and local views, and for selecting between the local views, the regressor is implemented as a multi-layer perceptron with batch normalization and RELU activation functions.

7. The graph neural network method based on multi-scale graph contrast learning of claim 6, wherein the method comprises the following steps: in step S4, the graph neural network mainly follows a message passing mechanism, and each node acquires attribute and structure information of a neighboring node through the message passing mechanism, so as to update its own node representation, and through k-layer iteration, the node captures information of its k-hop neighbor, namely:

READOUT (·) is graph pooling, a sum pooling mechanism;

z＝g(f(G))

thereby obtaining global and local representations of the graph data, i.e

And->

8. The graph neural network method based on multi-scale graph contrast learning of claim 1, wherein the method comprises the following steps: in step S5, the neural network is optimized by assigning different weight coefficients to three different loss items, so that it can learn good graph level representation, and then be used for different downstream tasks.

9. The graph neural network method based on multi-scale graph contrast learning of claim 8, wherein the method comprises the following steps: the step S5 specifically comprises the following steps:

Is l _s The method comprises the following steps:

wherein N is the number of samples in the batch;

s56, defining an overall loss function as follows:

s57, minimizing total loss by means of Adam gradient descent algorithm

To update the encoder parameters and use the pre-trained encoder for different downstream tasks. />