CN113159160B

CN113159160B - Semi-supervised node classification method based on node attention

Info

Publication number: CN113159160B
Application number: CN202110412835.8A
Authority: CN
Inventors: 俞俊; 甘银兰; 丁佳骏
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2024-06-25
Anticipated expiration: 2041-04-16
Also published as: CN113159160A

Abstract

The invention discloses a semi-supervised node classification method based on node attention. The method comprises the following steps: step (1) data preprocessing; step (2) extracting the characteristics of nodes through a graph rolling network of 1-2 layers, and preparing the node characteristics as data of subsequent operations; and (3) node self-adaptive adjustment: firstly, average aggregation of the features of first-order neighbors of each node, and then splicing the feature information of the node and the average aggregated features to obtain a required local characterization; then, the extracted local representation is sent into a single-layer full-connection network, and the output result of the full-connection network and the node characteristics obtained in the step (2) are input into a gating unit for characteristic fusion; and (4) classification prediction and accuracy measurement. The invention can be adaptively adjusted for each node, has obvious advantages in space complexity compared with the graph annotation force network, and has the performance equivalent to the graph annotation force network.

Description

Semi-supervised node classification method based on node attention

Technical Field

The invention provides a semi-supervised node classification method based on node attention, which mainly aims at the conditions of large scale and dense edges of graph data, and utilizes the thought of an attention mechanism to realize self-adaptive adjustment of nodes so as to acquire node representation with more identification degree and improve the training efficiency and performance of a model.

Background

In recent years, network analysis has received increasing attention. Through research on the relation between nodes in the network, the nodes can be marked with marks, and the marks comprise information such as interest, hobbies, social influence and the like. However, in reality, the network graph contains a large number of unlabeled nodes, and it is particularly important how to effectively classify the unlabeled nodes by using the existing labeled nodes and the network structure relationship. Unlike traditional datasets, each piece of data has its own individual feature vector, nodes in the network can interact because of the edge relationships, such as friends relationships in social networks, cross-reference relationships in paper networks, and so on. By analyzing the network structure, including the relationships between nodes and edges, and utilizing semi-supervised learning of a small portion of marked nodes to more accurately classify unmarked nodes in the network, the trouble of many manual marks and the high overhead of additional computation can be saved.

At present, the semi-supervised node classification problem based on a network structure has the following three directions, namely relationship learning, feature representation learning and deep learning. The typical algorithm of relation learning, such as an RN classifier, is only suitable for a relatively small network, has relatively high computational complexity, and needs a network diagram to have certain special properties; feature representation learning is to learn a feature representation of a node based on a network structure, and is most widely studied in recent years based on a random walk algorithm, such as DeepWalk, node2vec and the like. The method has the advantages of small calculated amount, good classification effect and wide application. Under the rapid development of deep learning, a convolutional neural network model based on a graph is based on a graph theory, a parameterized filter is established in a spectral domain by referring to Fourier transform, a large number of excellent algorithms, such as GCN, graphSAGE, fastGCN and the like, are provided, compared with the traditional node classification algorithm, the node classification algorithm based on the deep learning is higher in efficiency, but is inferior in running time and space. Therefore, how to reduce the time and space complexity of the graph neural network model is a current research hotspot and difficulty.

In the task of classifying semi-supervised nodes, two technical difficulties exist. Firstly, the model learning problem under large-scale data is solved, and how to design a light learning strategy to reduce the complexity of the model by considering the huge space and time cost of the existing deep learning algorithm; meanwhile, a large amount of noise exists in the neighborhood of the node, error information is necessarily introduced by directly absorbing neighborhood information, an effective learning method is designed, node characteristic representation is adjusted in a self-adaptive mode, effective information is obtained, and noise information is prevented from being introduced.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art and provides a semi-supervised node classification method based on node attention.

The technical scheme adopted by the invention for solving the technical problems comprises the following steps:

Step (1) data preprocessing

Classifying data sets by semi-supervised nodes, selecting 20 nodes as training sets in each class according to a data processing mode of GCN, randomly selecting 500 nodes as verification sets, and selecting 1000 nodes as test sets;

Step (2) feature extraction

All nodes (including training set, verification set and test set) firstly extract node characteristic expression through a single-layer graph rolling network (GCL), and the obtained node characteristic expression is used as data preparation of subsequent operation;

step (3) node self-adaptive adjustment

Firstly, average aggregation of the features of first-order neighbors of each node, and then splicing the feature information of the node and the average aggregated features to obtain the needed local characterization, and cross-node interaction to obtain richer local characterization; then, the extracted local characterization is sent into a single-layer full-connection network (FC), and the output result of the full-connection network (FC) and the node characteristics obtained in the step (2) are input into a gating unit for characteristic fusion, so that the characteristic information of each node is readjusted;

Step (4) Classification prediction

And finally, outputting the classification probability through an output layer, and calculating the accuracy.

Further, the data preprocessing in the step (1):

1-1 dataset (Cora, citeseer, pubmed) Cora dataset total 2708 sample points, each sample point being a scientific paper, all sample points being divided into 7 categories, each paper being represented by a 1433-dimensional word vector, there being 5429 citations; the Citeseer data set has 3327 sample points, has 4732 reference relations, all samples are divided into 6 major classes, and each node has 3703 dimension characteristics; the Pubmed dataset has 19717 sample points and 44338 reference relations. According to a standard data set dividing method, all data sets are subjected to the following operations: selecting 20 nodes from each class as a training set, randomly selecting 500 nodes from the rest data as a verification set, and selecting 1000 nodes as a test set;

further, the feature extraction in the step (2):

2-1 extracts node information for each node through a single-layer graph convolution network. The single-layer graph convolutional network mainly contains 2 parts of content:

2-2 feature transformation: and obtaining new node characteristic expression through a learnable parameter.

2-3 Feature polymerization: and 2-2, carrying out Laplace smoothing on the node characteristic expression obtained in the step, namely carrying out weighted summation on the neighbor of each node and the characteristic expression of the node, taking the weighted summation as the new characteristic of the current node, and carrying out an activation function on the new characteristic to obtain the new node characteristic expression.

Further, the node adaptation of (3) described above:

3-1 first defines node attention: including aggregation neighborhood, cross-node interaction, and gating mechanisms.

3-2 Aggregation neighborhood: the local characterization containing topology information is obtained by carrying out average aggregation on the feature expression of the first-order neighbor nodes of each node.

3-2 Cross-node interactions: splicing the local representation obtained by the aggregation neighborhood and the characteristic information of the node to obtain a new node representation, sending the node representation into a single-layer fully-connected network for self-learning, and outputting an attention coefficient matrix with the same size as the node representation obtained in the step 2-3 through an activation function.

3-3 Gating mechanism: and (3) carrying out normalization processing on the result obtained in the step (3-2) so that the value of the result belongs to [0,1], and then carrying out corresponding multiplication on the normalized attention coefficient matrix and the node representation generated in the step (2-3) to obtain the node representation of the self-adaptive adjustment.

Further, the classification prediction in the step (4) is as follows:

4-1, carrying out graph stacking on the node characteristic representation obtained in the step (3) after self-adaption adjustment to obtain the classification probability of the nodes; and calculate the accuracy.

The invention has the following beneficial effects:

aiming at large-scale and dense graph data in practical application, node classification tasks (two classifications or multiple classifications) are researched based on the concept of a graph convolution neural network framework and an attention mechanism, and topological information and node characteristic expression of the graph are combined to produce node expression with higher expressive force. The feature characterization of the direct neighborhood node of the central node is fused with the feature characterization of the central node, the attention coefficient matrix is obtained through a simple gating mechanism after cross-node interaction, and finally the node feature expression which is self-adaptively adjusted is obtained. The important characteristics are enhanced, and the unimportant information is filtered, so that the classification accuracy is improved. In addition, the space complexity of the algorithm is linearly related to the number of nodes, and the complexity of the model is reduced.

The invention inserts a node attention module in the traditional two-layer graph rolling network; compared with GCN, the invention has the advantages that the performance is respectively improved by 1.5 percent, 2.4 percent and 0.8 percent on Cora, citeseer percent and Pubmed data sets; there is also a performance improvement of 0.5%,0.2%,0.5% compared to GAT, respectively, and the present invention reduces the spatial complexity of the attention calculations from O (E) to O (N).

Drawings

FIG. 1 is a schematic view of the overall framework of the present invention;

FIG. 2 is a detailed block diagram of a node attention layer;

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in FIG. 1, a semi-supervised node classification method for node attention specifically comprises the following steps

Step (1) data preprocessing

For the graph node classification data set, 20 nodes are selected from each class as training sets according to a standard data set processing method, 500 nodes are randomly selected from the rest data as verification sets, and 1000 nodes are selected as test sets;

Step (2) feature extraction

All nodes firstly extract node characteristics through a shallow graph rolling network and serve as data preparation for subsequent operation;

step (3) node self-adaptive adjustment

As shown in fig. 2, firstly, the features of the first-order neighbors of each node are aggregated averagely, then the feature information of each node and the aggregated features are spliced together, and cross-node interaction is performed to obtain richer local information; then sending the extracted local information into a single-layer fully-connected network, and inputting the result and the feature map obtained in the step (1) into a gating unit for feature fusion, so as to realize readjustment of the feature information of each node;

Step (4) Classification prediction

And finally, outputting the classification probability through a picture convolution output layer, and calculating the accuracy.

Further, the data preprocessing in the step (1):

further, the feature extraction in the step (2):

2-2 feature transformation. The feature expression is self-learned by an optimizable parameter.

2-3 Feature polymerization. And using the topological structure of the graph to abut the matrix, and transmitting and absorbing the node representation after neighborhood feature transformation.

Further, the node adaptation of (3) described above:

3-1 we first define node attention, which consists of two parts, aggregation neighborhood, cross-node interaction and gating mechanism, respectively.

3-2 Aggregating the neighborhood, i.e. obtaining a compressed local representation by averaging the information of the directly adjacent nodes of each node.

3-2 Cross-node interaction, splicing the local characterization obtained by aggregating the neighborhood and the characteristic information of the local characterization by using the neighborhood to obtain a new node expression, sending the new node expression into a single-layer full-connection network to perform self-learning, and inputting a characteristic diagram with the same size as the node expression obtained by 2-3.

And 3-3 gating mechanism, normalizing the result obtained in 3-2 to make the value belong to [0,1], and multiplying the matrix with the node representation generated in 2-3 to realize self-adaptive adjustment of the node.

Further, the classification prediction in the step (4) is as follows:

4-1 obtaining a classification probability matrix through a graph roll lamination according to the node representation obtained in the previous step, and calculating a relevant measurement index.

Examples: taking the node 4 in fig. 2 as an example, the node characteristic expression obtained through the step (2) is denoted as h= [ H ₁,h₂,…,h₇ ], the node characteristic of the node 4 is denoted as H ₄, the first-order neighbor node of the node 1 is denoted as H ₁,h₂,h₃,h₅, the full-connection network FC is denoted as f, the activation function is denoted as sigma, the normalization function is denoted as delta, and the node characteristic expression of the node 4 after the self-adaptation adjustment is denoted as H' ₄;

3-2 step aggregation neighborhood:

3-2 step cross-node interaction: h "=σ (f (h')

3-3 Gating mechanism: h' ₄＝h₄ delta (h ").

Claims

1. A semi-supervised node classification method based on node attention is characterized by comprising the following steps:

Step (1) data preprocessing

Step (2) feature extraction

Extracting node characteristic expression of all nodes through a single-layer graph convolution network, and preparing the obtained node characteristic expression as data of subsequent operation;

step (3) node self-adaptive adjustment

Firstly, average aggregation of the features of first-order neighbors of each node, and then splicing the feature information of the node and the average aggregated features to obtain the needed local characterization, and cross-node interaction to obtain richer local characterization; then sending the extracted local representation into a single-layer full-connection network, and inputting the output result of the full-connection network and the node characteristics obtained in the step (2) into a gating unit for characteristic fusion, so as to realize readjustment of the characteristic information of each node;

Step (4) Classification prediction

Finally, outputting the classification probability through an output layer, and calculating the accuracy;

data preprocessing in the step (1):

According to the standard data set dividing method, the following operations are carried out on all data sets: selecting 20 nodes from each class as a training set, randomly selecting 500 nodes from the rest data as a verification set, and selecting 1000 nodes as a test set;

The dataset comprises: the Cora data set is totally 2708 sample points, each sample point is a scientific paper, all sample points are divided into 7 categories, each paper is represented by a 1433-dimensional word vector, and 5429 quotation relations exist; the Citeseer data set has 3327 sample points, has 4732 reference relations, all samples are divided into 6 major classes, and each node has 3703 dimension characteristics; pubmed the data set has 19717 sample points and 44338 quotation relations;

the feature extraction in the step (2):

2-1 extracting node information from each node through a single-layer graph rolling network;

The single-layer graph convolutional network mainly contains 2 parts of content:

① Feature transformation: obtaining new node characteristic expression through a learnable parameter;

② Feature polymerization: the obtained node characteristic expression is smoothed by Laplacian, namely the neighbor of each node and the characteristic expression of the node are weighted and summed to be used as the new characteristic of the current node, and the new characteristic is activated to obtain the new node characteristic expression;

the node self-adaptive adjustment in the step (3):

3-1 first defines node attention including: aggregation neighborhood, cross-node interaction and gating mechanisms;

① Aggregation neighborhood: the method comprises the steps that a local representation containing topology information is obtained through average aggregation of feature expression of first-order neighbor nodes of each node;

② Cross-node interaction: splicing the local representation obtained by aggregating the neighborhood and the characteristic information of the node to obtain a new node representation, sending the node representation into a single-layer fully-connected network for self-learning, and outputting an attention coefficient matrix with the same size as the node representation obtained in the step 2-3 through an activation function;

③ Gating mechanism: normalizing the obtained local characterization to ensure that the value of the local characterization belongs to [0,1], and then carrying out corresponding multiplication on the normalized attention coefficient matrix and node representation generated by feature aggregation to obtain self-adaptive adjusted node characterization;

classification prediction as described in step (4):

The node characteristic representation obtained in the step (3) after self-adaptation adjustment is subjected to a graph roll lamination to obtain the classification probability of the nodes; and calculate the accuracy.