CN113159160A

CN113159160A - Semi-supervised node classification method based on node attention

Info

Publication number: CN113159160A
Application number: CN202110412835.8A
Authority: CN
Inventors: 俞俊; 甘银兰; 丁佳骏
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-07-23

Abstract

The invention discloses a semi-supervised node classification method based on node attention. The invention comprises the following steps: preprocessing data; extracting characteristics, namely extracting node characteristics through a graph convolution network of 1-2 layers to prepare data for subsequent operation; and (3) self-adaptive adjustment of nodes: firstly, averagely aggregating the characteristics of first-order neighbors of each node, and then splicing the characteristic information of the node and the averagely aggregated characteristics to obtain required local characteristics; then sending the extracted local characteristics into a single-layer fully-connected network, and inputting the output result of the fully-connected network and the node characteristics obtained in the step (2) into a gate control unit for characteristic fusion; and (4) classifying and predicting and measuring accuracy. The invention can be self-adaptively adjusted for each node, has obvious advantages in space complexity compared with the attention network, and has equivalent performance to the attention network.

Description

Semi-supervised node classification method based on node attention

Technical Field

The invention provides a semi-supervised node classification method based on node attention, which mainly aims at the conditions of large scale and dense edges of graph data and utilizes the idea of attention mechanism to realize the self-adaptive adjustment of nodes so as to obtain node representation with more identification degree and improve the training efficiency and performance of a model.

Background

In recent years, network analysis has received increasing attention. Through the research on the relationship between the nodes in the network, the nodes can be marked, and the marks comprise information such as hobbies, social influence and the like. However, in reality, the network graph contains a large number of unmarked nodes, and how to effectively classify the unmarked nodes by using the existing marked nodes and the network structure relationship is particularly important. Different from a traditional data set, each piece of data has an independent feature vector, and nodes in the network can influence each other due to marginal relations, such as friend relations in a social network, mutual reference relations in a paper network and the like. Through analysis of the network structure, including the relation between nodes and between edges, the small part of marked nodes are utilized to accurately classify unmarked nodes in the network, so that the trouble of manual marking and the high cost of additional calculation can be saved.

At present, the semi-supervised node classification problem based on the network structure has the following three directions, namely relationship learning, feature representation learning and deep learning. A typical algorithm for relation learning, such as an RN classifier, is only suitable for a relatively small network, and has relatively high computational complexity, and requires a network graph itself to have some special properties; feature representation learning is to learn out feature representations of nodes based on a network structure, and most widely studied in recent years is based on a random walk algorithm, such as deep walk and node2 vec. The method has the advantages of small calculated amount, good classification effect and wide application. Under the rapid development of deep learning, a graph-based convolutional neural network model is based on a graph theory, a parameterized filter is established in a spectral domain by taking the Fourier transform as a reference, a large number of excellent algorithms are provided, such as GCN, GraphSAGE, FastGCN and the like, compared with the traditional node classification algorithm, the node classification algorithm based on deep learning has higher efficiency, but is inferior in operation time and space. Therefore, how to reduce the time and space complexity of the neural network model is a current research hotspot and difficulty.

In the semi-supervised node classification task, two technical difficulties exist. One of the problems is the problem of model learning under large-scale data, and how to design a light-weight learning strategy and reduce the complexity of the model by considering the huge space and time overhead of the existing deep learning algorithm; meanwhile, a large amount of noise exists in the neighborhood of the node, error information is inevitably introduced by directly absorbing neighborhood information, an effective learning method is designed, the characteristic representation of the node is adaptively adjusted, effective information is obtained, and introduction of noise information is avoided.

Disclosure of Invention

The invention aims to provide a semi-supervised node classification method based on node attention, aiming at the defects of the prior art.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step (1) data preprocessing

Classifying data sets for semi-supervised nodes, selecting 20 nodes in each class as a training set according to a GCN data processing mode, randomly selecting 500 nodes as a verification set, and selecting 1000 nodes as a test set;

step (2) feature extraction

Extracting node feature expression from all nodes (including a training set, a verification set and a test set) through a single-layer graph convolutional network (GCL), and preparing the obtained node feature expression as data of subsequent operation;

step (3) node self-adaptive adjustment

Firstly, averagely aggregating the characteristics of first-order neighbors of each node, splicing the characteristic information of the node and the averagely aggregated characteristics to obtain required local characteristics, and performing cross-node interaction to obtain richer local characteristics; then sending the extracted local characteristics into a single-layer full-connection network (FC), and inputting the output result of the full-connection network (FC) and the node characteristics obtained in the step (2) into a gate control unit for characteristic fusion so as to realize readjustment of the characteristic information of each node;

step (4) classified prediction

And finally, outputting the classification probability through an output layer, and calculating the accuracy.

Further, the data preprocessing of step (1):

1-1 citation data set (Cora, Citeseer, Pubmed). the Cora data set contains 2708 sample points, each sample point is a scientific paper, all sample points are divided into 7 categories, each paper is represented by a 1433-dimensional word vector, and 5429 citation relations exist; the Citeser data set comprises 3327 sample points, 4732 reference relations exist, all samples are divided into 6 large classes, and each node has 3703-dimensional characteristics; the Pubmed data set has 19717 sample points and 44338 reference relations. We follow the standard dataset partitioning method, and perform the following operations on all datasets: selecting 20 nodes from each class as a training set, randomly selecting 500 nodes from the rest data as a verification set, and using 1000 nodes as a test set;

further, the feature extraction in the step (2):

2-1 extracting node information for each node through a single-layer graph convolutional network. The single-layer graph convolution network mainly comprises 2 parts of contents:

2-2 feature transformation: and obtaining a new node feature expression through a learnable parameter.

2-3 characteristic polymerization: and 2-2, performing Laplace smoothing on the node feature expression obtained in the step, namely performing weighted summation on the neighbor of each node and the feature expression of the node per se to serve as a new characteristic of the current node, and performing an activation function on the new characteristic to obtain a new node feature expression.

Further, the node in (3) is adaptively adjusted:

3-1 node attention is first defined: including aggregation neighborhoods, cross-node interactions, and gating mechanisms.

3-2 aggregation neighborhood: namely, a local representation containing topology information is obtained by averagely aggregating the feature expressions of the first-order neighbor nodes of each node.

3-2 cross-node interaction: and (3) splicing the local characteristics obtained by the aggregation neighborhood with the characteristic information of the nodes to obtain a new node characteristic, sending the node characteristic into a single-layer full-connection network for self-learning, and outputting an attention coefficient matrix with the same size as the node characteristic obtained in the step (2-3) through an activation function.

3-3 gating mechanism: and (3) normalizing the result obtained in the step (3-2) to enable the value to belong to [0,1], and then carrying out corresponding multiplication on the normalized attention coefficient matrix and the node representation generated in the step (2-3) to obtain a node representation of self-adaptive adjustment.

Further, the classification prediction of step (4):

4-1, representing the node characteristics after the self-adaptive adjustment obtained in the step (3), and obtaining the classification probability of the nodes through a graph convolution layer; and calculate the accuracy.

The invention has the following beneficial effects:

for large-scale and dense graph data in practical application, a node classification task (two-classification or multi-classification) is researched based on the ideas of a graph convolution neural network framework and an attention mechanism, and topological information and node feature expression of a graph are combined to produce a more expressive node representation. The feature representation of the direct neighborhood node of the central node is fused with the feature representation of the central node, an attention coefficient matrix is obtained through a simple gating mechanism after cross-node interaction, and finally self-adaptive adjustment node feature expression is obtained. Namely, important features are strengthened, unimportant information is filtered, and the classification accuracy is improved. In addition, the space complexity of the algorithm is linearly related to the number of the nodes, and the complexity of the model is reduced.

The invention inserts a node attention module in the traditional two-layer graph convolution network; compared with GCN, the performance of the invention is respectively improved by 1.5%, 2.4% and 0.8% on the data sets of Cora, Citeser and Pubmed; there are also performance gains of 0.5%, 0.2%, 0.5% compared to GAT, respectively, and the present invention reduces the spatial complexity of the attention calculation from o (e) to o (n).

Drawings

FIG. 1 is a general framework schematic of the present invention;

FIG. 2 is a detailed block diagram of a node attention layer;

details of the embodiments

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, a semi-supervised node classification method for node attention specifically includes the following steps

Step (1) data preprocessing

For a graph node classification data set, according to a standard data set processing method, 20 nodes are selected from each class to serve as a training set, 500 nodes are randomly selected from the rest data to serve as a verification set, and 1000 nodes are selected to serve as a test set;

step (2) feature extraction

All nodes firstly extract node characteristics through a shallow graph convolution network to serve as data preparation of subsequent operation;

step (3) node self-adaptive adjustment

As shown in fig. 2, firstly, the features of the first-order neighbors of each node are averagely aggregated, then the feature information of each node is spliced with the aggregated features, and cross-node interaction is performed to obtain richer local information; then, the extracted local information is sent into a single-layer full-connection network, and the result and the characteristic diagram obtained in the step (1) are input into a gate control unit for characteristic fusion, so that the characteristic information of each node is readjusted;

step (4) classified prediction

And finally, outputting the classification probability and calculating the accuracy through a graph convolution output layer.

Further, the data preprocessing of step (1):

further, the feature extraction in the step (2):

2-2 feature transformation. The feature expression is self-learned by an optimizable parameter.

2-3 characteristic polymerization. And (4) propagating and absorbing the node representation after neighborhood characteristic transformation by using the topological structure of the graph and the adjacent matrix.

Further, the node in (3) is adaptively adjusted:

3-1 first we define node attention, which consists of two parts, aggregation neighborhood, cross-node interaction and gating mechanism.

And 3-2, aggregating neighborhoods, namely, averagely aggregating the information of the directly adjacent nodes of each node to obtain a compressed local representation.

3-2 cross-node interaction, splicing local representations obtained by the aggregation neighborhood and characteristic information of the local representations and the characteristic information of the local representations to obtain a new node expression, sending the new node expression into a single-layer full-connection network for self-learning, and inputting a characteristic diagram with the same size as the node expression obtained in the 2-3.

And a 3-3 gating mechanism is used for carrying out normalization processing on the result obtained by the 3-2 to enable the value to belong to [0,1], and then multiplying the matrix by the node representation generated by the 2-3 to realize the self-adaptive adjustment of the node.

Further, the classification prediction of step (4):

4-1, according to the node representation obtained in the previous step, obtaining a classification probability matrix through a graph convolution layer, and calculating a related measurement index.

Example (b): as illustrated by the node 4 in fig. 2, the node feature expression obtained in step (2) is denoted as H ═ H₁,h₂,…,h₇]The node representation of node 4 is denoted as h₄The first-order neighbor node of node 1 is marked as h₁,h₂,h₃,h₅The fully-connected network FC is recorded as f, the activation function is recorded as σ, the normalization function is recorded as δ, and the node feature expression of the node 4 after adaptive adjustment is recorded as h'₄；

3-2 step aggregation neighborhood:

3-2, cross-node interaction: h ═ σ (f (h'))

3-3 gating mechanism: h'₄＝h₄*δ(h”)。

Claims

1. A semi-supervised node classification method based on node attention is characterized by comprising the following steps:

step (1) data preprocessing

step (2) feature extraction

Extracting node feature expressions from all nodes through a single-layer graph convolution network, and taking the obtained node feature expressions as data preparation of subsequent operation;

step (3) node self-adaptive adjustment

Firstly, averagely aggregating the characteristics of first-order neighbors of each node, splicing the characteristic information of the node and the averagely aggregated characteristics to obtain required local characteristics, and performing cross-node interaction to obtain richer local characteristics; then sending the extracted local characteristics into a single-layer fully-connected network, and inputting the output result of the fully-connected network and the node characteristics obtained in the step (2) into a gate control unit for characteristic fusion so as to realize readjustment of the characteristic information of each node;

step (4) classified prediction

2. The method for node attention-based semi-supervised node classification according to claim 1, wherein the data preprocessing of the step (1):

according to the standard data set division method, the following operations are carried out on all data sets: and (3) selecting 20 nodes in each class as a training set, randomly selecting 500 nodes from the rest data as a verification set, and selecting 1000 nodes as a test set.

3. The node attention-based semi-supervised node classification method according to claim 2, wherein the cited data set comprises: the Cora data set comprises 2708 sample points, each sample point is a scientific paper, all the sample points are divided into 7 categories, each paper is represented by a 1433-dimensional word vector, and 5429 reference relations exist; the Citeser data set comprises 3327 sample points, 4732 reference relations exist, all samples are divided into 6 large classes, and each node has 3703-dimensional characteristics; the Pubmed data set has 19717 sample points and 44338 reference relations.

4. A node attention-based semi-supervised node classification method according to claim 2 or 3, wherein the feature extraction in the step (2):

2-1, extracting node information for each node through a single-layer graph convolution network;

the single-layer graph convolution network mainly comprises 2 parts of contents:

firstly, feature transformation: obtaining a new node feature expression through a learnable parameter;

characteristic polymerization: and performing Laplace smoothing on the obtained node feature expression, namely performing weighted summation on the neighbor of each node and the feature expression of the node per se to serve as a new characteristic of the current node, and performing an activation function on the new characteristic to obtain a new node feature expression.

5. The node attention-based semi-supervised node classification method according to claim 4, wherein the node in (3) is adaptively adjusted:

3-1 first defines node attention as including: aggregating neighborhood, cross-node interaction and gating mechanisms;

-aggregating neighborhoods: the method comprises the steps that a local representation containing topological information is obtained by averagely aggregating the feature expressions of first-order neighbor nodes of each node;

cross-node interaction: splicing local characteristics obtained by the aggregation neighborhood with characteristic information of the nodes to obtain a new node characteristic, sending the node characteristic into a single-layer full-connection network for self-learning, and outputting an attention coefficient matrix with the same size as the node characteristic obtained in the step 2-3 through an activation function;

③ gating mechanism: and then carrying out corresponding multiplication on the normalized attention coefficient matrix and the node representation generated by feature aggregation to obtain the node representation of the self-adaptive adjustment.

6. The method of claim 5, wherein the classification of step (4) predicts:

expressing the node characteristics after the self-adaptive adjustment obtained in the step (3), and obtaining the classification probability of the nodes through a graph convolution layer; and calculate the accuracy.