CN112085161A

CN112085161A - Graph neural network method based on random information transmission

Info

Publication number: CN112085161A
Application number: CN202010842540.XA
Authority: CN
Inventors: 崔鹏; 牛辰昊; 张子威; 朱文武
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-12-15
Anticipated expiration: 2040-08-20
Also published as: CN112085161B

Abstract

The invention provides a graph neural network method based on random information transfer, and belongs to the field of graph neural networks and deep learning. Firstly, constructing a graph neural network including a node characteristic graph neural network GNN and a random node characteristic graph neural network GNN', and correspondingly selecting an output function, a decoding function and a loss function; for each graph data to be processed, dividing a corresponding test set, a corresponding verification set and a corresponding training set according to different tasks, and training a graph neural network by adopting an early-stopping method; and predicting the test set of the graph data to be processed by using the trained graph neural network. The method can be used for machine learning tasks of various graph data and has high application value.

Description

Graph neural network method based on random information transmission

Technical Field

The invention belongs to the field of a graph neural network and deep learning, and particularly provides a graph neural network method based on random information transmission.

Background

The graph neural network is an artificial neural network on graph data and is widely used for machine learning tasks on various graph data, such as protein response prediction.

However, the current graph neural network cannot resolve self-isomorphic nodes in the graph due to the assumption of degeneration such as permutation, and thus cannot maintain node approximation. Therefore, in a graph task requiring node approximation, the current graph neural network is often inferior in performance because the approximation information cannot be utilized. For example, in the edge prediction task, the degree of connectivity (or "distance") between the point pairs is an important criterion, but a graph neural network that cannot maintain the node approximation cannot measure the "distance" of two points in the graph, so that more accurate prediction cannot be performed.

Therefore, the graph neural network can keep the node approximation degree, and is an important direction for enhancing the expression capacity of the graph neural network. One existing method is to make the graph neural network have the capability of maintaining the node approximation degree through the shortest path between a node and an anchor node (the anchor node is a plurality of nodes randomly selected in a preprocessing stage), but the method also makes the graph neural network lose the capability of maintaining degeneration such as replacement, and the network has poor performance on tasks requiring degeneration such as replacement. For example, on the node classification task, this method cannot exclude invalid node approximation information, resulting in more severe overfitting.

Due to the complexity of real data, whether a machine learning task of real-world graph data needs to replace degeneration and/or maintain node approximation cannot be clearly defined. Therefore, a neural network with two capabilities can solve the problem better, but only one of the two capabilities can be maintained in the existing research.

For example, a graphical neural network can be used to predict protein responses, i.e., whether a response will occur between any two proteins, based on the characteristic information of a given protein and some known response information between them. Because time-consuming and labor-consuming biological and chemical experiments are needed for detecting whether the proteins can react, the proteins which are more likely to react are predicted by using the neural network, and the efficiency of the experiments can be greatly improved. Specifically, the input to the network is a "protein response graph," in which each node represents a protein, each node may have a feature vector (representing its molecular weight, number of atoms, biological origin, etc. characteristic information), and each edge represents two proteins that are currently known to be connected by the edge to react. In this application, the output of the graph neural network is a prediction of whether there should be an edge between any two nodes (i.e., a prediction of whether the two proteins will react).

One of the existing graph neural networks for predicting protein response is a graph convolution neural network, and this method obtains a characterization vector of each node by performing a graph convolution operation between nodes to predict whether a response will occur between the two proteins. The disadvantage of this method is that node proximity cannot be maintained, resulting in under-fitting and failure to make sufficiently accurate predictions.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a graph neural network method based on random information transmission. The method can be used for machine learning tasks of various graph data and has high application value.

The invention provides a graph neural network method based on random information transfer, which is characterized by comprising the following steps of:

1) constructing a graph neural network; the method comprises the following specific steps:

1-1) constructing a graph neural network, wherein the graph neural network comprises a node feature map neural network GNN and a random node feature map neural network GNN';

1-2) selecting an output function f _ output, wherein the output dimension of the function is d _ output;

1-3) selecting a decoding function f _ decode according to the task;

wherein, for the node classification task, a function SoftMax (H _ i) is used as f _ decode, wherein i is any node in the graph, and H _ i represents the vector representation of the node i output by f _ output;

for the edge prediction task, Sigmoid (< H _ i, H _ j >) is used as f _ decode, wherein i and j are nodes in the graph respectively, H _ i and H _ j represent vector representations of the nodes i and j output by f _ output respectively, and < H _ i, H _ j > represent vector inner products of H _ i and H _ j;

1-4) selecting a loss function f _ loss according to the task; wherein, for the node classification task, a negative log-likelihood function is used as f _ loss; for the edge prediction task, using a cross entropy function as f _ loss;

2) dividing a data set;

for each graph data to be processed, respectively dividing a test set, a verification set and a training set, wherein each graph data comprises a node set V, an edge set E and an original node feature matrix F; the number of nodes of the graph is recorded as N, and the edges of the graph are represented as an adjacent matrix and recorded as P; according to different tasks, the specific division method is as follows:

2-1) node classification task:

for each to-be-processed graph data, for each type of nodes in the graph, randomly selecting 20 nodes to join a training set of the graph, randomly selecting 30 nodes to join a verification set of the graph, and adding the rest nodes to a test set of the graph; after the nodes of all the categories are processed, 20 nodes selected from each category form a training set of the graph, 30 nodes selected from each category form a verification set of the graph, and all the other nodes form a test set of the graph;

2-2) side prediction task:

for each graph data to be processed, selecting 80% of edges in the graph as a positive example of the graph training set, 10% of edges as a positive example of the graph verification set and 10% of edges as a positive example of the graph testing set; for the sides which do not exist in the graph, namely the negative sides, the negative sides with the same number as the positive examples of the graph verification set are used as the negative examples of the graph verification set, and the negative sides with the same number as the positive examples of the graph test set are used as the negative examples of the graph test set;

3) training the graph neural network established in the step 1) by using the data set obtained in the step 2) to obtain a trained graph neural network; the method comprises the following specific steps:

3-1) for each graph data to be processed, randomly sampling a random node feature vector with the length of d for each node i in a node set V of the graph to serve as a row E _ noise _ i of a random node feature matrix E _ noise of the graph, and obtaining a random node feature matrix E _ noise of N rows and d columns corresponding to the graph after sampling of all nodes in the set V is completed;

3-2) respectively taking the random node characteristic matrix E _ noise and the adjacent matrix A of the graph as the input of a neural network GNN 'of the random node characteristic graph to obtain the output of the GNN' corresponding to the graph, and recording the output as H _ noise;

3-3) taking the original node characteristic matrix F and the adjacent matrix A of the graph as the input of the node characteristic graph neural network GNN to obtain the output of the GNN corresponding to the graph, and recording the output as H _ ori;

3-4) for each node i in the node set V, applying an output function f _ output to [ H _ noise _ i, H _ ori _ i ], so as to obtain a vector representation H _ i of the node i, which is f _ output ([ H _ noise _ i, H _ ori _ i ]); h _ i of all nodes form a node vector characterization matrix H; wherein H _ noise _ i and H _ ori _ i represent the ith row of H _ noise and the ith row of H _ ori, respectively; [ H _ noise _ i, H _ ori _ i ] represents the juxtaposition of vector H _ noise _ i and vector H _ ori _ i;

3-5) acting f _ decode on the node vector characterization matrix H to obtain a graph neural network predicted value corresponding to the graph;

3-6) selecting a predicted value corresponding to the training set from the predicted values obtained in the step 3-5), taking the predicted value corresponding to the training set and a label of the training set as input of a loss function f _ loss, and calculating to obtain a loss value loss corresponding to the graph;

3-7) repeating the steps 3-1) to 3-6), calculating loss values loss corresponding to each graph data to be processed, then calculating the average value of all the loss values, and taking the average value of the loss values as loss _ total;

3-8) performing back propagation on the loss _ total to obtain a parameter gradient of the neural network of the graph, and performing parameter optimization by using an Adam optimizer;

3-9) repeating the steps 3-1) to 3-8) for P times, and running for one time of verification; the verification method comprises the following steps:

operating steps 3-1) to 3-7), wherein in step 3-6), the predicted value corresponding to the verification set is selected from the predicted values obtained in step 3-5), the predicted value corresponding to the verification set and the label of the verification set are used as the input of a loss function f _ loss, the loss value loss is calculated, and after the operation is finished, the loss _ total at the moment is recorded as a first valid _ loss and a corresponding graph neural network parameter;

3-10) repeating the steps 3-1) to 3-9) Q times, and recording the valid _ loss value obtained after each operation; selecting and storing the graph neural network parameters corresponding to the minimum value from all valid _ loss values, and finishing the training of the graph neural network;

4) predicting the test set of the graph data to be processed by using the graph neural network trained in the step 3), and specifically comprises the following steps:

4-1) repeating the step 3-1) on each graph data to be processed to obtain a new random node characteristic matrix E _ noise corresponding to the graph;

4-2) repeating the step 3-2)3-5), inputting the matrix obtained in the step 4-1) and the adjacent matrix A of the graph into the GNN' in the graph neural network trained in the step 3), and inputting the original node feature matrix F and the adjacent matrix A of the graph into the GNN in the graph neural network trained in the step 3) to obtain a new node vector characterization matrix H corresponding to the graph; acting f _ decode on the node vector characterization matrix H to obtain a predicted value of the trained graph neural network corresponding to the graph;

4-3) selecting a predicted value corresponding to the graph test set from the predicted values obtained in the step 4-2), wherein the selected predicted value is a predicted result of the graph test set;

4-4) processing the prediction result of the test set to obtain a corresponding prediction label; the specific method comprises the following steps:

for the node classification task, taking the number p of the dimension corresponding to the maximum value in the vector output by the graph f _ decode (H _ i) as the prediction of the trained graph neural network on the classification label of the graph node H _ i; for the edge prediction task, rounding the f _ decode (H _ i, H _ j) of the graph, and if the result is 0, indicating that the trained graph neural network predicts no edge between the i, j of the graph; if the value is 1, the graph neural network after training predicts an edge between the graph nodes i and j.

The invention has the characteristics and beneficial effects that:

random node characteristics are introduced into input node characteristics of the graph neural network, and then information transmission is carried out, so that the output node characteristics can keep the similarity; the node approximation degree provided by the invention can adjust whether the node approximation degree information is needed or not according to task requirements, so that the method has good universality in two task scenes of needing degeneration such as replacement and needing the node approximation degree. The graph neural network and the output function can be flexibly selected according to tasks, and a good effect can be achieved by using a linear network and a linear function under normal conditions, so that the whole network has low complexity.

The method can be used for machine learning tasks of various graph data and has high application value. The method is applied to the field of prediction of protein reaction, so that the experimental efficiency can be greatly improved, and meanwhile, the prediction accuracy is improved.

Drawings

Fig. 1 is a schematic diagram of the present invention.

Detailed Description

The invention provides a graph neural network method based on random information transfer, and the invention is further described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides a graph neural network method based on random information transfer, the principle of which is shown in a graph 1, and the method comprises the following steps:

1-1) constructing a graph neural network, the graph neural network of the present invention comprises a node signature graph neural network GNN and a random node signature graph neural network GNN '(alternative graph neural network methods of GNN and GNN' include, but are not limited to, a linear graph neural network SGC and a graph convolution network GCN. In this embodiment, experiments are performed on three cases, that is, GNN and GNN' are both SGC, both GCN, and both GCN and SGC, where the number of layers is 2, and the output dimensions are 32).

1-2) selecting an output function f _ output (optional functions include, but are not limited to, a Linear function Linear and a multi-layer perceptron MLP, in this embodiment, experiments are performed for two cases of using Linear and MLP for f _ output), where the output dimension is d _ output, in this embodiment, d _ output is selected to be 32, and other values may be selected according to the use cases.

1-3) selecting a decoding function f _ decode according to the task. For the node classification task, the function SoftMax (H _ i) is used as f _ decode, where i is any node in the graph, and H _ i represents the vector characterization of the present network for node i, i.e., the output of f _ output. For the edge prediction task, Sigmoid (< H _ i, H _ j >) is used as f _ decode, where i and j are nodes in the graph respectively, H _ i and H _ j represent vector representations of the network for the nodes i and j respectively, and < H _ i, H _ j > represent inner products of two vectors of H _ i and H _ j.

1-4) selecting a loss function f _ loss according to the task. For the node classification task, a negative log-likelihood function is used. For the edge prediction task, a cross entropy function is used.

2) Dividing the data set: for each graph data to be processed, respectively dividing a test set, a verification set and a training set, wherein each graph data is defined by three parts, namely a point set V, an edge set E and an original node feature matrix F (wherein the original node feature matrix F is a part of the data and is given without any additional step for acquisition); the number of nodes in the graph (i.e., the size of the set V) is denoted by N, and the edges in the graph are denoted by P. According to different tasks, the data set is divided into the following two cases:

2-1) node classification task:

for each type of nodes in the graph, randomly selecting 20 nodes to join a training set of the graph, randomly selecting 30 nodes to join a verification set of the graph, and adding the rest nodes to a test set of the graph; after the nodes of all the categories are processed, 20 nodes selected from each category form a training set of the graph, 30 nodes selected from each category form a verification set of the graph, and all the other nodes form a test set of the graph;

if the number of the graph data to be processed is more than 1, dividing the training set, the verification set and the test set of each graph according to the scheme. The proportion of the selected nodes of each data set can be adjusted according to actual conditions.

2-2) side prediction task:

for each graph data to be processed, 80% of edges in the graph are selected as a positive example of the graph training set, 10% of edges are selected as a positive example of the graph verification set, and 10% of edges are selected as a positive example of the graph testing set. For edges that do not exist in the graph (i.e., negative edges), the same number of negative edges as the positive instances of the graph verification set are used as the negative instances of the graph verification set, and the same number of negative edges as the positive instances of the graph test set are used as the negative instances of the graph test set. The training set does not need to select negative examples at this time.

If the number of the graph data to be processed is more than 1, dividing each graph according to the scheme, and dividing the training set, the verification set and the test set of each graph according to the scheme. The proportion of the selected edges of each data set can be adjusted according to actual conditions.

3-1) for each graph data to be processed, randomly sampling a random node feature vector with the length of d for each node i in a node set V of the graph to serve as a row E _ noise _ i of a random node feature matrix E _ noise of the graph, and obtaining the random node feature matrix E _ noise with N rows and d columns corresponding to the graph after all nodes in the set V are sampled. Where d is adjustable according to the number of nodes, d-32 and d-64 are used in this embodiment. The specific sampling method comprises the following steps: for each dimension in the vector E _ noise _ i, a random sampling is made from the standard normal distribution.

3-2) respectively taking the random node characteristic matrix E _ noise and the adjacent matrix A of the graph as the input of the neural network GNN 'of the random node characteristic graph to obtain the output of the GNN' corresponding to the graph, and recording the output as H _ noise. H _ noise is a vector representation of a matrix with N rows, each row corresponding to a node. The adjacent matrix is a matrix with N rows and N columns, the element of the ith row and the jth column in the matrix indicates whether an edge exists between nodes i and j, and the indicating method comprises the following steps: if there is an edge between nodes i, j, this element is 1, otherwise it is 0.

3-3) taking the original node feature matrix F and the adjacent matrix A of the graph as the input of the node feature graph neural network GNN to obtain the output of the GNN corresponding to the graph, and recording the output as H _ ori. H _ ori is a matrix with N rows, each row corresponding to a vector representation of a node.

3-4) for each node i in the node set V, applying an output function f _ output to [ H _ noise _ i, H _ ori _ i ], so as to obtain a vector representation H _ i of the node i, which is f _ output ([ H _ noise _ i, H _ ori _ i ]); h _ i of all nodes constitutes a node vector characterization matrix H. Wherein H _ noise _ i and H _ ori _ i represent the ith row of H _ noise and the ith row of H _ ori, respectively; [ H _ noise _ i, H _ ori _ i ] represents the juxtaposition of vector H _ noise _ i and vector H _ ori _ i.

3-5) according to the definition of the decoding function f _ decode in the step 1-3), acting the f _ decode on the node vector characterization matrix H to obtain the predicted value of the neural network of the graph corresponding to the graph.

3-6) selecting the predicted value corresponding to the training set from the predicted values obtained in the step 3-5), taking the predicted value corresponding to the training set and the label of the training set as the input of the loss function f _ loss, and calculating to obtain the loss value loss corresponding to the graph.

3-7) repeating the steps 3-1) to 3-6), calculating loss values loss corresponding to each graph data to be processed, then calculating the average value of all the loss values, and taking the average value of the loss values as loss _ total. When only one graph data is to be processed, the loss corresponding to the graph data is used as the loss _ total.

3-8) back-propagating the loss _ total to obtain the parameter gradient of the neural network of the graph, and performing parameter optimization by using an Adam optimizer.

3-9) repeating the steps 3-1) to 3-8) P times (the value range of P is 1-10 and far less than Q, and P should be as small as possible and is 1 at the minimum under the permission of training time. 5 times in this example), one validation run. The verification method comprises the following steps: and 3-1) to 3-7) are operated, but the training set in the step 3-6) is replaced by a verification set, and after the operation is finished, the loss _ total at the moment is recorded as the first valid _ loss and the corresponding neural network parameters of the graph.

3-10) repeat steps 3-1) to 3-9) Q times (Q should be as large as possible so that loss _ total does not decrease significantly after repeating the above steps Q times. In this example, Q is 1000. ) And recording the valid _ loss value obtained after each operation. And selecting and storing the graph neural network parameters corresponding to the minimum value from all valid _ loss values, and finishing the training of the graph neural network. The method for selecting and storing the parameters is an early-stopping method.

4-2) repeating the steps 3-2) -3-5), inputting the matrix obtained in the step 4-1) and the adjacent matrix A of the graph into the GNN' in the graph neural network trained in the step 3), and inputting the original node feature matrix F and the adjacent matrix A of the graph into the GNN in the graph neural network trained in the step 3) to obtain the corresponding new node vector characterization matrix H; acting f _ decode on the node vector characterization matrix H to obtain a predicted value of the trained graph neural network corresponding to the graph;

4-3) selecting a predicted value corresponding to the graph test set from the predicted values obtained in the step 4-2), wherein the selected predicted value is a predicted result of the graph test set.

4-4) processing the prediction result of the test set to obtain a corresponding prediction label. Specifically, for the node classification task, the dimension number p corresponding to the maximum value in the vector output from the graph f _ decode (H _ i) is used as the prediction of the classification label of the graph node H _ i by the trained graph neural network. For the edge prediction task, rounding the f _ decode (H _ i, H _ j) of the graph, and if the result is 0, indicating that the trained graph neural network predicts no edge between the nodes i, j of the graph; if 1, it means that the trained graph neural network predicts that there is an edge between the nodes i, j of the graph.

The present invention is further described in detail below with reference to a specific example.

The embodiment is in the field of protein reaction prediction, and acts on a public protein reaction dataset (PPI dataset, English is called protein-protein interaction network data set, and download link is https:// githu. com/JiaxuanYou/P-GNN/tree/master/data) to predict the protein reaction therein. This data set contains 24 protein response maps. In each reaction graph, each node represents a protein, each node has a feature vector with the length of 50 (each dimension represents the feature information such as the molecular weight of the protein, the number of atoms and the like), and each edge represents that two proteins connected by the edge can react. In this application, the output of the graph neural network is a prediction of whether there should be an edge between any two nodes (i.e., a prediction of whether the two proteins will react), with predictions represented by 0 (predicted not to react) and 1 (predicted to react).

The embodiment provides a graph neural network method based on random information transfer, which comprises the following steps:

1-2) selecting an output function f _ output (optional functions include, but are not limited to, Linear function and multilayer perceptron MLP, in this embodiment, experiments are performed for two cases where Linear and MLP are used for f _ output), and the output dimension is d _ output, in this embodiment, d _ output is selected to be 32.

1-3) selecting a decoding function f _ decode according to the task. In this embodiment, since the task of edge prediction is performed, Sigmoid (< H _ i, H _ j >) is used as f _ decode, where i and j are nodes in the graph, H _ i and H _ j represent vector representations of the network for the nodes i and j, respectively, and < H _ i, H _ j > represents an inner product of two vectors, i.e., H _ i and H _ j.

1-4) selecting a loss function f _ loss according to the task. The present embodiment uses a cross entropy function.

2) Dividing the data set:

for each graph data in the data set, 80% of edges in the graph are selected as a positive example of the graph training set, 10% of edges are selected as a positive example of the graph verification set, and 10% of edges are selected as a positive example of the graph testing set. For edges that do not exist in the graph (i.e., negative edges), the same number of negative edges as the positive instances of the graph verification set are used as the negative instances of the graph verification set, and the same number of negative edges as the positive instances of the graph test set are used as the negative instances of the graph test set. The training set does not need to select negative examples at this time.

The data set used in the examples contained 24 protein response maps, 14 of which were used as the training set, 4 as the test set, and 4 as the validation set. Namely: for a graph in the training set, only the part of the graph which is used as the training set is used as input; for a graph in a verification set, only using a part of the graph as a training set as an input, and only using a part of the graph as a verification set as an output for verification; for a graph in a test set, only using a part of the graph as a training set as an input, and only using a part of the graph as a test set as an output for testing;

3-9) repeat steps 3-1) to 3-8) P times (in this example, P ═ 5 times), run one validation. The verification method comprises the following steps: and 3-1) to 3-7) are operated, but the training set in the step 3-6) is replaced by a verification set, and after the operation is finished, the loss _ total at the moment is recorded as the first valid _ loss and the corresponding neural network parameters of the graph.

3-10) repeating the steps 3-1) to 3-9) Q times (in the embodiment, Q is 1000), and recording the valid _ loss value obtained after each operation. And selecting and storing the graph neural network parameters corresponding to the minimum value from all valid _ loss values, and finishing the training of the graph neural network. The method for selecting and storing the parameters is an early-stopping method.

4-4) processing the prediction result of the test set to obtain a corresponding prediction label. Rounding the graph f _ decode (H _ i, H _ j), if the result is 0, indicating that the trained graph neural network predicts that the graph has no edge between the nodes i, j (namely, the protein represented by the node i and the protein represented by the node j are predicted not to react); if the value is 1, the trained graph neural network predicts that the graph has edges between the nodes i and j (namely, the protein represented by the node i and the protein represented by the node j are predicted to react).

Claims

1. A graph neural network method based on random information transfer is characterized by comprising the following steps:

1-3) selecting a decoding function f _ decode according to the task;

2) dividing a data set;

2-1) node classification task:

2-2) side prediction task: