CN112990285A

CN112990285A - Simplified attack method oriented to large-scale graph structure

Info

Publication number: CN112990285A
Application number: CN202110241960.7A
Authority: CN
Inventors: 谢洪途; 李金膛; 王国倩
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-06-18

Abstract

The invention provides a simplified attack method facing a large-scale graph structure, which firstly uses a simplified graph convolution neural network as a substitute model with better expansibility and flexibility, and compared with GCN used in the past research, the substitute model has higher training speed and reasoning speed and less memory requirements; and then selecting a k-order subgraph of the target attack node as input data, wherein the target node prediction result obtained by inputting the k-order subgraph is equivalent to the result obtained by inputting the whole graph. And finally, calculating the gradient of the alternative edge set by using the inference result of the substitution model on the k-order subgraph through back propagation, and selecting one edge with the maximum absolute value according to the gradient to turn over so as to iteratively carry out gradient attack until a termination condition is reached. The attack model can remarkably improve the running speed of an attack algorithm and reduce the use of a memory, and has better attack performance, so that the attack model can be suitable for large-scale graph network attack.

Description

Simplified attack method oriented to large-scale graph structure

Technical Field

The invention relates to the field of graph machine learning, in particular to a simplified attack method oriented to a large-scale graph structure.

Background

Graph (Graph) is a data structure that is visible throughout life, for example: social networks, traffic networks, and biological protein molecular structures, etc. can all be represented using graph structure data. Due to its powerful expression ability, a great deal of research oriented to Graph structure data has emerged in recent years, especially in conjunction with deep learning Graph Neural Networks (GNNs), which are capable of learning complex relationships between nodes and edges in graphs, which are generated from a wide range of problems from biology and particle physics to social networking and recommendation systems, and thus are becoming research hotspots for researchers at home and abroad.

The Graph neural Network is mainly represented by a Graph Convolutional neural Network (GCN), and characteristic information is continuously transmitted and aggregated between a node and a neighbor node thereof, so that the model can better learn node, edge or Graph structure information and predict different downstream tasks. However, recent studies have shown that the graph neural network is very susceptible to perturbation added by a malicious attacker to change the prediction result. For example, a malicious attacker can make the model mispredict the classification result of a certain node or nodes by only adding or deleting a small number of edges in the graph. The research aiming at the attack resistance of the graph is a very important work in the graph field, can be used for testing the robustness of the model facing the attack in the worst case, and is an important research branch in the graph machine learning field emerging in recent years.

The related work of existing graphs against attacks has mainly two branches: gradient-based methods and non-gradient-based methods. The gradient-based method is simple and effective, and mainly comprises GradArgmax model proposed by Hanjun Dai et al and Metattack model proposed by Daniel Zugner et al. Firstly, training a substitution model with generalization, then, carrying out gradient calculation on an input graph by using the substitution model, and selecting edges needing to be added or deleted according to the gradient of the alternative edge set in the graph. Daniel Zugner et al think that the gradient method is not suitable for discrete graph structure data, put forward the Nettack model, regard the linear graph convolution neural network of the simplified version as the substitution model, use the greedy algorithm to try out the optimal attack disturbance iteratively. Although previous methods can effectively add perturbation to the graph data for attack, these algorithms can only be applied to small-scale graph structure data due to their low efficiency in time and space, and cannot migrate to larger-scale data sets. The reasons are mainly two reasons: (1) the common alternative model graph convolutional network needs to use neighbor information of all nodes for aggregation updating in the training process; (2) most attack algorithms need to utilize the structural information of the entire graph in order to compute the optimal attack perturbation. For the reasons, the existing attack method cannot be applied to a large-scale data set, but large-scale graph structure data in real life is quite common, so that a more efficient attack algorithm applicable to a real scene is needed. Therefore, how to solve the attack method facing the large-scale graph structure data is a technical problem to be solved urgently.

Disclosure of Invention

The invention provides a simplified attack method facing a large-scale graph structure, which improves the running speed of an attack algorithm and reduces the use of a memory and has better attack performance.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a simplified attack method facing a large-scale graph structure comprises the following steps:

s1: constructing a simplified graph convolution neural network substitution model;

s2: extracting a k-order subgraph of the target node by using the alternative model in the step S1;

s3: performing fast gradient calculation by using the subgraph obtained in the step S2;

s4: and (5) iteratively selecting an attack edge and updating a subgraph by using the gradient.

Further, the specific process of step S1 is:

a two-layer convolution neural network is adopted, and the output of the simplified convolution neural network substitution model is represented as follows:

the loss function of the simplified graph convolution neural network surrogate model training is a cross-entropy loss function, namely:

wherein, V_LTo train the nodes of the set, the graph convolutional neural network is trained using an Adam optimizer until the simplified graph convolutional neural network surrogate model converges on the training set.

Further, in the step S1:

where Z is the predicted output of the simplified convolutional neural network surrogate model, f_θTo replace the model learned mapping function with a simplified atlas neural network,

w is a trainable simplified graph convolution neural network substitution model weight matrix; because the intermediate nonlinear activation function ReLU is removed and calculated in advance

And simplifying the complex graph convolution neural network into a linear full-connection network, thereby quickly training the simplified graph convolution neural network to be used as a substitute model of the target classifier.

Further, the specific process of step S2 is:

because the simplified graph convolution neural network substitution model adopts two layers, a 2-order subgraph of an attack target node t is extracted:

constructing a subset of nodes V_sAnd edge subset E_sThe node subset is a 2-order neighbor node of the target node, and the edge subset is all edges in the second-order range of the node; because the set only contains the connected alternative edges and does not contain the edges which are not connected in the original image, the method can only realize the edge reduction operation but cannot realize the edge adding operation; however, the number of each node and the nodes which are not directly adjacent to each node in the sparse graph is often quite large, adding all the nodes of the sparse graph can cause the scale of the subgraph to be rapidly expanded, and in order to avoid the situation, only the class of the node which is different from the target node and belongs to the second class c 'of the target node which is most probable is considered when adding the new alternative node'_tThe node of (a), namely:

V_p＝{u∣c_u＝c′_t,u∈V}

wherein the content of the first and second substances,

is an alternative set of attack nodes.

Further, the specific process of step S3 is:

utilizing a simplified graph convolution neural network to replace a loss function of a model on a sub-graph, calculating corresponding gradients of an attacked alternative edge set, and obtaining the importance of each edge relative to the model, so that the edge with the maximum importance is selected for changing;

since k-order subgraphs are extracted in step S2, and a sparse matrix is used as input, high time and space complexity is avoided; introducing another misclassification loss to better calculate the gradient, the loss function for the target node t is designed as:

wherein^(sub)Representing the result obtained by means of a subgraph equivalent calculation, c_tIs the correct class label of the target node, c'_tError class label for target node, Z_tAnd outputting the probability for the target node t for the prediction of the surrogate model.

Further, in step S3, since the surrogate model is a model trained to converge and tends to maximize the classification probability for the correct class and minimize the classification probability for the incorrect class, the first term of the loss function is caused

Trending toward 0 in the second term

Toward 1, there is an unguided case, resulting in a phenomenon of gradient disappearance. To solve this problem, the output of the surrogate model is corrected by adding a correction factor e to the softmax activation layer that the surrogate model finally outputs:

by using the correction factor, the condition that the model is too self-confident to the output is reduced, so that the phenomenon of gradient disappearance is eliminated.

Further, the specific process of step S4 is:

assuming that the edge selected in step S3 is (u, v), whether to add/delete the edge is determined according to whether the edge exists in the original image and the sign of the gradient, that is:

meanwhile, in order to maintain the integrity of the k-order subgraph and keep equivalent effect, if the added edges contain new nodes in the k-order neighbor range of the target node in the process of iteratively updating the subgraph, the new nodes are required to be classified into the subgraph; conversely, if the deleted edge contains the original node outside the k-th order neighbor range of the target node, the node needs to be deleted.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the method uses a simplified graph convolution neural network as a substitute model with better expansibility and flexibility, and compared with GCN used in the past research, the substitute model has higher training speed and reasoning speed and less memory requirements; and then selecting a k-order subgraph of the target attack node as input data, wherein the target node prediction result obtained by inputting the k-order subgraph is equivalent to the result obtained by inputting the whole graph. And finally, calculating the gradient of the alternative edge set by using the inference result of the substitution model on the k-order subgraph through back propagation, and selecting one edge with the maximum absolute value according to the gradient to turn over so as to iteratively carry out gradient attack until a termination condition is reached. The attack model can remarkably improve the running speed of an attack algorithm and reduce the use of a memory, and has better attack performance, so that the attack model can be suitable for large-scale graph network attack.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of the iterative gradient attack and sub-graph update (dilation) of the present invention;

FIG. 3 is a graph of the probability distribution of the output of the surrogate model before and after the use of the correction algorithm of the present invention;

FIG. 4 is a comparison of the relative run times of the present invention and comparative attack methods;

FIG. 5 is a direct attack result for different classification models according to the present invention and the comparison method;

FIG. 6 shows the indirect attack results of the present invention and the comparison method for different classification models.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, a simplified attack method for a large-scale graph structure includes the following steps:

let a single undirected, weightless representation be G ═ a, X, where a ∈ {0,1}^N×NFor adjacency matrices, X ∈ {0,1}^N×FIs a node feature matrix. Each node i has a class label c_iThe task of the classifier is to predict the class label of the unknown test set point using the class label of the given partial training set point. The task of the attacker is instead to aim at the attackers by changing a small number of edges in the graph

Such that the classification model misclassifies for a given target node t.

Firstly, training a substitution model;

this step can be considered pre-training prior to the attack. The SGC may simplify the conventional graph convolution neural network to:

where Z is the predicted output of the model, f_θFor the mapping function of the model learning,

the sparse adjacency matrix is subjected to Laplace regularization operation, and W is a trainable model weight matrix. Because the intermediate nonlinear activation function ReLU is removed and calculated in advance

Can be used forSimplifying a complex graph convolution neural network into a linear full-connection network, thereby quickly training an SGC (serving as a surrogate model of a target classifier);

the surrogate model uses a two-layer SGC model (k 2), and the output of the SGC model can be expressed as:

the loss function of model training is a cross-entropy loss function, namely:

wherein, V_LAre nodes of the training set. Training the SGC using an Adam optimizer until the model converges on the training set;

secondly, extracting k-order subgraphs of the target nodes;

this step can be considered as a pre-treatment before the attack. Sparse matrix

Whether the value of a certain element (a certain edge) exists is only related to the shortest distance between two nodes connected with the edge in the original graph, namely:

wherein

Is the shortest distance between node u and node v. Therefore, the output result obtained by using the k-order subgraph of the target node as the input data of the SGC surrogate model is equivalent to the result obtained by using the whole graph. Because most graphs in a real scene are quite sparse, and the number of edges and nodes in a k-order subgraph of nodes is far less than that of the whole graph, the operation can remarkably improve the operation of an attack algorithmLine speed and memory usage reduction;

since the substitute model SGC is usually a two-layer (k is 2) model, which can achieve the best classification effect, only a 2-order sub-graph of the attack target node t needs to be extracted:

constructing a subset of nodes V_sAnd edge subset E_sThe node subset is a 2-order neighbor node of the target node, and the edge subset is all edges in the second-order range of the node. Since the set only contains the connected alternative edges and does not contain the edges which are not connected in the original image, the method can only perform the edge reduction operation but cannot perform the edge adding operation. However, the number of each node and its non-direct neighbors in a sparse graph is often quite large, and adding all its nodes results in a rapid expansion of the subgraph scale. To avoid this, the invention makes a preferential consideration, and only considers the second category c 'which is different from the category of the target node and belongs to the most probable target node when adding the new alternative node'_tThe node of (a), namely:

V_p＝{u∣c_u＝c′_t,u∈V}

wherein the content of the first and second substances,

is an alternative set of attack nodes. By the method, the extra complexity caused by considering too many alternative nodes and edges is avoided, and the directivity of the attack method is enhanced;

thirdly, performing fast gradient calculation by using the subgraph;

the method mainly comprises the steps of calculating corresponding gradients of an attack candidate edge set by using a loss function of a substitution model on a subgraph, obtaining the importance of each edge relative to the model, and selecting the edge with the maximum importance to change (turn over).

In the previous research of attacking the model by using a gradient method, an input of a substitution model is an NxN dense matrix which is used for calculating the gradient of an alternative edge, and the time and space complexity is up to O (N)²). However, since k-order subgraphs are extracted in the previous step, the invention only needs to adopt a sparse matrix as input, thereby avoiding high time and space complexity. In addition, most of the conventional loss functions are simply to adopt a loss function-cross entropy function consistent with a substitute model, and the loss function is used for calculating the loss of the model for correctly classifying the target node, and the anti-attack aims to maximize the misclassification probability of the target node, so that another misclassification loss needs to be introduced to better calculate the gradient. The penalty function for the target node t is thus designed as:

wherein^(sub)Representing the result obtained by means of a subgraph equivalent calculation, c_tIs the correct class label of the target node, c'_tError class label for target node, Z_tAnd outputting the probability for the target node t for the prediction of the surrogate model. However, since surrogate models are models that have been trained to converge, which tend to maximize the classification probability for the correct class, minimizing the classification probability for the incorrect class results in the first term of the penalty function

Trending toward 0 in the second term

Toward 1, there is an unguided case, resulting in a phenomenon of gradient disappearance. To solve this problem, the present invention proposes a simple method for inputting and outputting the modelAnd (3) correcting, namely adding a correction factor e to the last output softmax activation layer of the substitution model:

by utilizing the correction factor, the condition that the model is over-confident (over-confident) for output can be reduced, so that the phenomenon of gradient disappearance is eliminated;

to avoid the gradient disappearance phenomenon, a correction factor e is added to the final output of the substitution model:

then, using the penalty function for the target node t:

reverse conduction computation of the loss for the input subgraph adjacency matrix

Partial derivatives of (a) to obtain a gradient matrix

Step four, iteratively selecting attack edges and updating subgraphs by utilizing gradients

The step mainly comprises the step of selecting the edges which can cause the model to be misclassified finally to change by utilizing the gradient result obtained in the third step. Since the gradient of a certain edge can represent the importance of the edge to the model, the candidate edge sets are sorted according to the absolute value of the gradient, and the edge with the largest absolute value is selected to be changed (turned over) so as to maximally mislead the model classification. In addition, in order to better estimate the change of the input change to the model gradient, after the side with the largest absolute value is selected each time, the gradient is continuously and iteratively calculated until the side of the delta is selected, and the side of the delta is modified to obtain malicious input data which can misclassify the target node by the misleading model to the greatest extent. Meanwhile, in order to maintain the integrity of the k-order subgraph, if a new node contained in an added edge is in a k-order neighbor range of a target node in the process of iteratively updating the subgraph, the new node is required to be included in the subgraph; on the contrary, if the original node contained in the deleted edge is out of the k-order neighbor range of the target node, the node needs to be deleted;

assuming that the edge selected in the previous step is (u, v), whether to add/delete the edge is determined according to whether the edge exists in the original image and the sign of the gradient, that is:

Fig. 2 is a schematic diagram of the iterative gradient attack and sub-graph update (dilation) process according to the present invention. As shown in fig. 2, the iterative gradient attack mainly includes three processing steps: firstly, calculating gradient and selecting attack edges; secondly, subgraph expansion; and thirdly, sub-graph updating. The iteration is then re-run from the first step until a termination condition is met.

The Simplified Gradient Algorithm (SGA) method facing to the large-scale graph structure is verified through a comparison experiment, and the effectiveness of the method is proved by theoretical analysis and comparison experiment results.

In comparative experiments, the data set used in the present invention is shown in table 1, where Citeseer, Cora, Pubmed refer to the graph data set for three commonly used papers. The graphs of the three data sets are all sparse graphs and accord with real scene data.

TABLE 1 comparative experimental data set used in the present invention

Parameter(s)	Citeseer	Cora	Pubmed
				Number of nodes	2,110	2,485	19,717
Number of edges	3,668	5,069	44,324
				Density of picture	0.082％	0.082％	0.011％
Mean degree of nodes	3.50	4.08	4.50

Figure 3 is a classification model output probability distribution graph before and after using a correction algorithm on the Citeseer data set. As shown in fig. 3(a), before the correction, the model has a very high prediction probability (approximately 1) for the first class (correct class) of the node and a very low prediction probability (approximately 0) for the second class (correct class) of the remaining classes, which results in the gradient vanishing phenomenon described above. As shown in fig. 3(b), by skillfully introducing a correction factor, the probability prediction distribution of the model can be moved toward the middle without being in an extreme distribution condition, so that the model prediction effect is ensured and the phenomenon of gradient disappearance is avoided.

FIG. 4 is a comparison of the present invention versus the comparative attack method versus runtime. Wherein Nettack and GradArgmax are respectively two latest attack methods In the current graph attack field, "-In" is an indirect attack result In the data set (an edge allowing modification can not be directly adjacent to a target node), and the rest is a direct attack result (an edge allowing modification can be directly adjacent to a target node). The method provided by the invention only uses a 2-order subgraph of the target node, and the input scale of the subgraph is far smaller than the scale of the whole subgraph, so that better running time performance can be obtained, and the expected effect can be achieved. In contrast, the remaining two comparison methods are much larger in runtime than the SGA method proposed by the present invention because they need to use the entire graph as input, and the runtime of these two methods gradually increases relative to the SGA method of the present invention as the size of the data set becomes larger.

Fig. 5 and fig. 6 show the attack results of the present invention on different target classification models (graph neural networks), which are divided into two settings, direct attack and indirect attack. Wherein, SGC, GCN and GAT are graph neural network classification models widely used in the graph field. The measure index is a Classification boundary (CM), which is defined as the difference between the prediction probability of the Classification model for the correct class and the highest prediction probability in the wrong class of the target attack node, the smaller the index is, the better the attack effect is, and when the index is less than zero, the attack success is indicated. In the experiment, 1000 nodes are randomly selected as target attack nodes to carry out the experiment, and as can be seen from fig. 5 and 6, compared with other comparison methods, the attack method of the invention has better attack performance while obtaining high-efficiency attack performance. Especially in direct attack setting, the method provided by the invention can almost reach 100% attack success rate for attacking GCN and SGC. The attack performance obtained by utilizing subgraph equivalence is almost equivalent to the current latest attack algorithm Nettack (NTK for short in the graph), and even a better effect is obtained in an indirect attack experiment. In addition, the experimental result obtained by the method is far superior to other attack algorithms, which shows that the method has high attack performance and good attack performance, and is an attack method suitable for large-scale graph structures.

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A simplified attack method oriented to a large-scale graph structure is characterized by comprising the following steps:

2. The simplified attack method for large-scale graph structures according to claim 1, wherein the specific process of step S1 is:

3. The simplified attack method for large-scale graph structures according to claim 2, wherein in the step S1:

4. The simplified attack method for large-scale graph structures according to claim 3, wherein the specific process of step S2 is:

V_p＝{u|c_u＝c′_t，u∈V}

wherein the content of the first and second substances,

is an alternative set of attack nodes.

5. The simplified attack method for large-scale graph structures according to claim 4, wherein the specific process of step S3 is:

6. The simplified attack method based on large-scale graph structure according to claim 5, wherein in step S3, since the surrogate model is a model trained to converge, it tends to maximize the classification probability for the correct class, and the classification probability for the wrong class is minimized, which results in the first term of the loss function

Trending toward 0 in the second term

7. The simplified attack method for large-scale graph structures according to claim 6, wherein the specific process of step S4 is: