CN113536383B

CN113536383B - Method and device for training graph neural network based on privacy protection

Info

Publication number: CN113536383B
Application number: CN202110957071.0A
Authority: CN
Inventors: 熊涛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2023-10-27
Anticipated expiration: 2041-01-27
Also published as: CN112464292A; CN113536383A; CN112464292B

Abstract

The embodiment of the specification provides a method and a device for training a graph neural network based on privacy protection. And for any second node in the neighbor node set, inputting the node information of the second node, the node information of the first node and the connection information of the second node and the first node into the multi-layer neural network to obtain the matching degree of the second node and the first node. And then, sampling the neighbor node set according to the matching degree corresponding to each neighbor node in the neighbor node set, so as to obtain a sampled neighbor node set of the first node. And then forming a sparse relation network graph based on sampling neighbor node sets corresponding to each node in the original graph. Then, based on the sparse relation network graph, a graph neural network is trained.

Description

Method and device for training graph neural network based on privacy protection

The application discloses a divisional application of an application patent application with the application number 202110109491.3 which is named as a method and a device for training a graphic neural network based on privacy protection and is filed on the 1 st month of 2021.

Technical Field

One or more embodiments of the present specification relate to the field of computer technology, and in particular, to a method and apparatus for performing a neural network based on differential protection training patterns by a computer.

Background

Relational network graphs are recently becoming a large core area of machine learning. Data mining and machine learning based on relational network graphs are of increasing value in a number of fields. For example, the structure of the social network may be understood by predicting potential connections, fraud detection may be performed based on graph structures, consumer behavior of e-commerce users may be understood, real-time recommendations may be made, etc.

At the same time, people's importance to privacy is increasing. The relationship network graph contains a large amount of information, and various artificial intelligence and machine learning (AI/ML) models using graph information present a great risk of data privacy leakage if improperly protected. For example, with the advent of IOT age, many AI/ML's were deployed on-premise (cell phone/other IOT devices) to make real-time decisions after being developed with large-scale graph data in the cloud. Such benefits are self-evident in that the user's privacy is protected and the cost of data transfer is reduced due to the reduced data transfer from the end to the cloud. However, if the model is stolen, extensive graph data information used to train the model is also at risk of being compromised.

Therefore, an improved solution is desired, which can more safely and more effectively train a reliable neural network model.

Disclosure of Invention

One or more embodiments of the present disclosure describe a method and apparatus for training a graph neural network based on privacy protection, so that the trained graph neural network can better protect privacy security of graph data information.

According to a first aspect, there is provided a method for training a graph neural network based on privacy protection, comprising:

acquiring an original relation network diagram, wherein the original relation network diagram comprises a plurality of nodes, and any first node in the plurality of nodes is provided with a corresponding first neighbor node set;

for any second node in the first neighbor node set, inputting node information of the second node and node information of the first node into a multi-layer neural network to obtain matching degree of the second node and the first node;

sampling the first neighbor node set according to the matching degree corresponding to each neighbor node in the first neighbor node set, so as to obtain a sampled neighbor node set of the first node;

forming a sparse relation network graph based on sampling neighbor node sets corresponding to the nodes respectively;

And training a graph neural network based on the sparse relation network graph.

In one embodiment, sampling the first set of neighbor nodes according to the degree of matching specifically includes: normalizing the matching degree corresponding to each neighbor node respectively to obtain corresponding matching probability; and sampling each neighbor node according to the matching probability.

In another embodiment, the sampling the first set of neighboring nodes according to the matching degree specifically includes: determining a first sampling probability of the second node being sampled according to a first privacy budget and the matching degree of the second node and the first node based on an exponential mechanism of differential privacy; and sampling each neighbor node according to the first sampling probability respectively corresponding to each neighbor node in the first neighbor node set.

Further, in one example, a predetermined number k of samplings may be performed according to a first sampling probability, and k neighbor nodes are sampled from the first neighbor node set as the sampling neighbor node set.

In yet another embodiment, the first sampling probabilities corresponding to the neighboring nodes respectively may be input into a gummel-softmax function to obtain the second sampling probabilities corresponding to the neighboring nodes respectively; and sampling each neighbor node according to the second sampling probability corresponding to each neighbor node.

Further, in one example, a predetermined number k of samplings may be performed according to the second sampling probability, and k neighbor nodes are sampled from the first neighbor node set as the sampling neighbor node set.

According to one embodiment, a sparse relationship network graph includes labeling nodes having labels; the training graph neural network based on the sparse relation network graph comprises: performing graph embedding on the sparse relation network graph by using the graph neural network to obtain a node embedding vector of the labeling node; determining a corresponding first gradient of the graph neural network according to the node embedding vector and the label; and updating the graph neural network according to the first gradient.

In one embodiment, updating the graph neural network according to the first gradient specifically includes: adding noise on the first gradient by utilizing a noise mechanism of differential privacy to obtain a first noise gradient; and updating parameters of the graph neural network according to the first noise gradient.

Further, in one example, the first noise gradient is obtained by: cutting the first gradient based on a preset cutting threshold value to obtain a cutting gradient; determining gaussian noise for achieving differential privacy using a gaussian distribution determined based on the clipping threshold, wherein a variance of the gaussian distribution is positively correlated with a square of the clipping threshold; and superposing the Gaussian noise and the clipping gradient to obtain the first noise gradient.

In one embodiment, performing graph embedding on the sparse relation network graph to obtain a node embedding vector of the labeling node, which specifically includes: acquiring neighbor nodes of the labeling nodes in the sparse relation network graph as target neighbor nodes; determining an aggregation weight of each target neighbor node, wherein the aggregation weight is determined based on the matching degree of the labeling node and each target neighbor node; and according to the aggregation weight, aggregating the node information of each target neighbor node to obtain the node embedded vector of the labeling node.

Further, in one example, determining the aggregate weight of each target neighbor node specifically includes: acquiring sampling probability of each target neighbor node when the target neighbor node is sampled, wherein the sampling probability is determined based on the matching degree of each target neighbor node and the labeling node by utilizing a Gumbel-softmax function; and determining the aggregation weight of each target neighbor node according to the sampling probability.

According to one embodiment, based on the sparse relation network graph, training a graph neural network further comprises: determining a second gradient corresponding to the multi-layer neural network according to the node embedding vector and the label; updating the multi-layer neural network according to the second gradient.

Further, a differential privacy mode can be utilized to add first noise on the first gradient, so as to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient; and adding second noise on the second gradient by utilizing a differential privacy mode to obtain a second noise gradient; and updating parameters of the multi-layer neural network according to the second noise gradient.

In various embodiments, the plurality of nodes in the relational network graph may include at least one of: user nodes, merchant nodes and article nodes.

According to a second aspect, there is provided an apparatus for training a graph neural network based on privacy protection, comprising:

an original graph acquisition unit configured to acquire an original relationship network graph, wherein the original relationship network graph comprises a plurality of nodes, and any first node in the plurality of nodes is provided with a corresponding first neighbor node set;

the matching degree acquisition unit is configured to input node information of a second node and node information of the first node into a multi-layer neural network for any second node in the first neighbor node set, so as to obtain the matching degree of the second node and the first node;

The sampling unit is configured to sample the first neighbor node set according to the matching degree corresponding to each neighbor node in the first neighbor node set, so as to obtain a sampled neighbor node set of the first node;

the sparse graph forming unit is configured to form a sparse relation network graph based on sampling neighbor node sets corresponding to the nodes respectively;

and the training unit is configured to train the graph neural network based on the sparse relation network graph.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, characterised in that the memory has executable code stored therein, the processor implementing the method of the first aspect when executing the executable code.

By the method and the device provided by the embodiment of the specification, the neighbor nodes in the original relationship network graph are sampled by utilizing the matching degree among the nodes determined by the multi-layer neural network, so that a sparse relationship network graph is obtained. Based on the sampled sparse relation network graph, training a graph neural network. Because the sparse relation network graph only comprises part of connecting edges sampled from the original relation network graph, the accurate graph structure information in the original relation network graph is difficult to reversely deduce based on the graph neural network trained in the way, and therefore the data privacy of the original relation network graph is protected. And, optionally, a differential privacy mechanism can be introduced in the sampling stage and/or gradient propagation stage, so that a certain randomness is introduced for training the graph neural network. By introducing a differential privacy mechanism, the privacy data security of the original relation network graph is further enhanced on the basis of guaranteeing the basic performance of the graph neural network.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a schematic architecture diagram of a training graph neural network according to the technical concepts of the present specification;

FIG. 2 illustrates a flow diagram of a method for training a graph neural network based on privacy protection, in accordance with one embodiment;

FIG. 3 illustrates an exemplary presentation of a relational network graph;

FIG. 4 illustrates a flow of steps for training a graph neural network based on a sparse relational network graph in one embodiment;

fig. 5 shows a schematic block diagram of a training apparatus of the neural network, according to one embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

As previously described, graph neural networks are typically trained based on relational network graphs. A relational network graph is a description of the relationships between entities in the real world and may be generally represented as a set of nodes representing entities in the real world and a set of edges representing links between entities in the real world. For example, in a social network, people are entities and the relationships or connections between people are edges. The relationship network diagram is typically formed by collecting and sorting a large amount of user-related data, and thus, often contains privacy data of the user. For example, social network diagrams may include records of social interactions between users, and so on.

After training the graph neural network based on a particular graph of the relationship network, the graph neural network may be used to predict tasks related to nodes and/or edges in the graph, e.g., predicting relationships between nodes, predicting classification of nodes, etc. Because the training is based on the relation network graph, the graph neural network carries information of the relation network graph. Thus, if the model parameters of the graph neural network are improperly protected, and the model parameters are leaked or stolen, corresponding relationship network graph data for training the graph neural network also has the risk of leakage.

Based on the above consideration, the inventor proposes a scheme of several embodiments in the present specification, in which an original relationship network graph is sampled, thinned, and a graph neural network is trained based on the thinned relationship network graph, so as to achieve the effect of protecting the privacy of the original graph.

Fig. 1 shows an architecture diagram of a training pattern neural network according to the technical concept of the present specification. As shown in fig. 1, an original relationship network diagram 100 is first obtained. The original relationship network diagram 100 may be a network diagram reflecting various associations. Typically, the original relational network graph is a relatively dense graph in which a large number of connecting edges, such as tens of connecting edges, exist for a large portion of the nodes. Nodes connected by connecting edges may be referred to as neighbor nodes. Each node then has a corresponding set of neighbor nodes.

For each node in the original relationship network graph 100, it is input into the multi-layer neural network 10 together with the neighbor nodes. The multi-layer neural network 10 is used to predict the degree of matching between two nodes of an input. These connection edges may then be sampled, or neighbor nodes in the set of neighbor nodes may be sampled, based on the degree of matching between the nodes where they exist. By sampling, only part of the connecting edges remain, so that the original relationship network diagram is thinned, and the sparse relationship network diagram 200 is obtained.

The sparse relationship network graph 200 may then be input into the graph neural network 20 for training the graph neural network 20. Because the sparse relation network diagram 200 only contains part of information in the original relation network diagram, the accurate diagram structure information in the original relation network diagram is difficult to reversely push out through the diagram neural network trained in the way, and therefore the data privacy of the original relation network diagram is protected. Moreover, through the above sparsification, the training speed of the graph neural network can be increased, and the graph neural network obtained in this way has stronger robustness.

Further, a differential privacy mechanism may also be introduced during the above training process. For example, in the sampling stage, the sampling probability of a certain connection edge can be determined through an exponential mechanism of differential privacy, so that a certain randomness is introduced for the sampling process. In the process of training the graph neural network by using the sparse graph, certain noise is introduced into the counter-propagating gradient through a noise mechanism of differential privacy, so that certain randomness is introduced into the determination of the model parameters. By introducing a differential privacy mechanism, on the basis of ensuring the basic performance of the graph neural network, the graph information of the original relationship network graph is more difficult to infer based on the graph neural network, so that the privacy data security is further protected.

The following describes a specific implementation of the above concept.

FIG. 2 illustrates a flow chart of a method for training a graph neural network based on privacy protection, in accordance with one embodiment. It is understood that the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. The following describes the training process of the graph neural network based on privacy protection in conjunction with the implementation architecture shown in fig. 1 and the method flow shown in fig. 2.

As shown in fig. 2, first, at step 21, an original relationship network diagram is acquired.

In different embodiments, the original relationship network graph may be a network graph reflecting various associations. For example, in one example, the original relationship network graph is a social relationship graph that contains a large number of nodes, each representing a user; the connecting edges between nodes represent social connection behaviors between two corresponding users, such as conversation, short messages, and other social interactions. In another example, the original relationship network graph is an iso-graph reflecting user behavior habits. In such a heterogeneous graph, a plurality of different kinds of nodes may be included, for example, merchant nodes, item nodes may be included in addition to user nodes. When a user accesses or purchases an item of a certain type (e.g., has seen a movie, has read a book, etc.), or the user has transacted at a merchant, a connection edge may be established between the corresponding nodes. Specific examples of relational network graphs are not explicitly enumerated and are not described in a single illustrative context.

In many scenarios, the number of nodes in the original relationship network graph is huge, e.g., the number of nodes in the social relationship graph can be on the order of tens of millions or even billions; the connection relationship between the nodes is also generally complex, and most nodes have a large number of connection edges, for example, tens or hundreds of connection edges. Thus, the original relationship network graph tends to be a relatively dense graph.

Fig. 3 shows an exemplary presentation of a relational network graph, with the left (a) part showing an example of an original relational network graph. It can be seen that the number of connection edges between nodes is large, and the connection relationship is complex and dense.

For this reason, according to the embodiment of the present specification, the above dense original relationship network diagram is thinned next. For simplicity and clarity of description, any node u in the original relationship network diagram is described below, which will be referred to hereinafter as the first node. Correspondingly, the nodes connected with the first node through the connecting edges are neighbor nodes of the first node u, and the neighbor nodes form a neighbor node set of the first node u, which is called a first neighbor node set and is marked as N _u . In the case where the relational network graph is a directed graph, the first set of neighbor nodes may be defined as a set of nodes pointing to the first node, or a set of nodes to which the first node points, or both, as desired.

Next, in step 22, for the above first set of neighbor nodes N _n The method comprises the steps of inputting node information N (v) of a second node v and node information N (u) of a first node into a multi-layer neural network 10 by any one neighbor node v (called a second node), and obtaining the matching degree z of the second node and the first node by connection information A (u, v) of the second node v and the first node u _u，v . The multi-layer neural network 10 may be implemented as, for example, a multi-layer perceptron MLP, a deep feed forward neural network DNN, a multi-layer convolutional neural network CNN, or the like. In the case of being implemented as a multi-layer perceptron MLP, the above degree of matching z _u，v Can be expressed as:

z _u，υ ＝MLP(N(u)，N(v)，A(u，v))

(1)

the above node information N (u) and N (v) are determined according to the attribute of the object represented by the node itself. In the case where the node represents a user, for example, N (u) may contain basic attribute information of the user u, such as gender,age, registration duration, etc. A (u, v) may be determined from information of a connection edge between the first node u and the second node v in the original relationship network diagram. For example, in a social relationship graph, the connection edges correspond to social interactions; a (u, v) contains frequency and/or manner information of social interactions between user u and user v. In one embodiment, node information N (u) and N (v), and connection information a (u, v) are encoded in the form of vectors, which are input to the multi-layer neural network. Obtaining the matching degree z of the first node u and the second node v through the operation of the multi-layer neural network _u，v 。

Then, in step 23, according to the first set of neighbor nodes N _u And sampling the first neighbor node set to obtain a sampled neighbor node set of the first node.

In one embodiment, the degree of matching z output by the multi-layer neural network _u，v The values themselves are in the range of (0, 1). In such a case, the matching degree corresponding to each neighbor node can be regarded as the matching probability P _u，v Sampling is performed according to the matching probability.

In another embodiment, the degree of matching output by the multi-layer neural network is a scoring of the degree of matching that is not limited in value to the (0, 1) range. In such a case, the matching degree corresponding to each neighboring node may be normalized to obtain a corresponding matching probability P _u，v The method comprises the steps of carrying out a first treatment on the surface of the And then sampling each neighbor node according to the matching probability. For example, k times of sampling may be performed according to the matching probability of each node, so as to obtain k neighbor nodes, which are used as a sampling neighbor node set of the first node.

According to one embodiment, when sampling the neighbor nodes, a privacy protection mechanism of differential privacy is introduced, so that certain randomness is introduced in the sampling process, and the privacy protection effect is enhanced.

Specifically, for the first node u and any neighboring nodes thereof, such as the second node v, the exponential mechanism of differential privacy can be based on the privacy budget e and the previously obtained matching degree z between the second node v and the first node u _u，v Determining a sampling probability pi that the second node v is sampled _u，υ Referred to as a first sampling probability.

In one example, the first sampling probability pi _u，v The determination may be made by:

in equation (2) above, ε is the privacy budget, N _u As the neighbor node set of the first node u, deltau is sensitivity, representing the maximum difference in function values (here, matching degrees) when performing a function operation on neighboring data sets. In the case where the matching degree of the multi-layer neural network output is (-1, 1), Δu is 2. The above equation (2) shows that the probability of the second node being sampled when sampling according to the first sampling probability, the degree of matching z with the first node _u，v Positive correlation and positive correlation with the privacy operation e.

By means of a first set of neighbor nodes N _n And (3) carrying out operation of the formula on each neighbor node to obtain a first sampling probability corresponding to each neighbor node. Thus, it is possible to obtain a first set N of neighbor nodes _n And sampling each neighbor node according to the first sampling probability corresponding to each neighbor node.

In a specific example, the maximum number of neighbor nodes k for each node in the sparse graph may be preset. Accordingly, when the original image sampling process is executed, for any first node u, the method can be based on the first sampling probability pi _u，υ Performing k samples, each sample being taken from the first set of neighbor nodes N _u And sampling one neighbor node, and thus, sampling k neighbor nodes from the first neighbor node set as a sampling neighbor node set. Of course, in special cases, if the original number of nodes in the first neighboring node set is not greater than k, the original first neighboring node set may be directly used as the sampling neighboring node set. The advantage of this example approach is that it ensures the resulting sparsity regardless of how dense the original relational network graph isThe number of neighbor nodes of each node in the relation graph does not exceed k.

In another specific example, the sampling ratio r of the sparse map with respect to each node of the original map may be preset, for example, 20%. Correspondingly, when executing the original image sampling process, for any first node u, the first neighbor node set N is firstly used _u And determining the number k of the nodes to be sampled according to the number of the medium nodes and the sampling proportion r. Then, according to the first sampling probability pi _u，υ Performing k samples, each sample being taken from the first set of neighbor nodes N _n And sampling one neighbor node, and thus, sampling k neighbor nodes from the first neighbor node set as a sampling neighbor node set. In this way, regardless of the number of neighbor nodes of each node in the original relationship network graph, sampling and compression can be performed according to a predetermined proportion, so that the sampled neighbor node set of each node is ensured to be different from the original neighbor node set.

Further, according to one embodiment, to further facilitate efficient back propagation of gradients during joint training of the multi-layer neural network and the graph neural network, sampling probabilities are determined in a form that is more conducive to gradient derivation. Specifically, in the case of obtaining the first sampling probability pi by the above formula (2) _u，v On the basis of the above, the first sampling probability corresponding to each neighbor node can be input into Gumbel-softmax function to obtain the second sampling probability x corresponding to each neighbor node _u，υ The method comprises the steps of carrying out a first treatment on the surface of the Then according to the second sampling probability x corresponding to each neighbor node _u，υ Sampling each neighbor node.

Specifically, in one example, the first sampling probability pi may be based on the following equation (3) _u，v Determining a second sampling probability x _u，v ；

∈ _υ ＝-log(-log(s)) (3)

In the above formula (3), s is randomly sampled from (0, 1).

When the neighbor sampling is performed based on the second sampling probability, similarly, a predetermined number k of samplings may be performed, so that k neighbor nodes are sampled from the first neighbor node set as a sampling neighbor node set. Alternatively, sampling is performed based on a predetermined sampling ratio r. And will not be described in detail herein.

By performing the above steps 22 and 23 for each node in the original relationship network diagram, a respective set of corresponding sampled neighbor nodes may be obtained. Thus, at step 24, a sparse relationship network graph may be formed based on the respective corresponding sampled neighbor node sets for each node.

The right side (b) of fig. 3 shows a sparse relationship network diagram obtained by sampling the original relationship network diagram on the left side. It can be intuitively seen that, compared with the original relationship network diagram, the number of connection edges in the sparse relationship network diagram is greatly reduced, and the connection relationship between nodes is greatly simplified.

Next, in step 25, a graph neural network is trained based on the sparse relationship network graph obtained above.

The specific process of training the graph neural network can have a variety of embodiments. From a randomness perspective, embodiments may include a way to strictly train based on the original gradient, and a way to approximate training to introduce noise to the gradient based on a differential privacy mechanism; from a joint training perspective, embodiments can be further divided into embodiments that train with a multi-layer neural network each separately, and embodiments that train with a multi-layer neural network. The various angle embodiments above, as well as the combined embodiments, are described below in connection with a typical process of training based on labeled nodes.

In a typical graph neural network training, some nodes in the graph neural network training can be marked so as to have labels corresponding to prediction tasks. For example, when the predictive task is to predict a user's transaction risk based on a user social relationship graph, a label may be assigned to a portion of the users in the social relationship graph for which the risk status is known, the label showing their real risk status. In different examples, the tags may be classification tags (e.g., high risk, medium risk, low risk categories) or numerical tags (e.g., specific risk scores). The labels are associated with subsequent prediction tasks performed based on the graph neural network. The embedded vectors of the labeling nodes are characterized and learned through the graph neural network, and the unlabeled nodes with unknown states can be predicted.

Fig. 4 shows a flow of steps for training a graph neural network based on a sparse relational network graph, i.e. one example of the sub-steps of step 25 described above, in one embodiment. The training method of fig. 4 is based on labeling nodes, that is, in the sparse relation network graph, labeling nodes x with labels are included, and the nodes have labels y with labels.

As shown in fig. 4, in step 251, the obtained sparse relation network graph is subjected to graph embedding by using a graph neural network, so as to obtain a node embedding vector Ex labeled with a node x. Different graph neural networks can use different algorithms to conduct graph embedding to obtain node embedding vectors of all nodes, and some graph neural networks can also obtain edge embedding vectors of edges in the graph. Although specific algorithms are different, in general, when the graph neural network performs graph embedding, for a target node to be analyzed, information of neighbor nodes of the target node is obtained, and neighbor node information is aggregated, so that an embedding vector of the target node is determined. When the labeling node x is used as a target node, a corresponding node embedded vector Ex can be obtained.

Then, at step 252, a corresponding first gradient of the graph neural network is determined from the node embedding vector Ex and the label y. In general, the task corresponding to the label may be predicted based on the node embedding vector Ex to obtain the prediction result y'. Then, based on the prediction result y' and the label y, obtaining a prediction loss L according to a preset loss function; and then, reversely propagating the predicted loss L in the graph neural network, namely, solving the partial derivative of the predicted loss relative to the network parameters of each network layer from the back to the front direction of the network layer of the graph neural network, and obtaining a first gradient corresponding to the graph neural network.

Next, in step 253, the neural network is updated, i.e., the network parameters thereof, according to the first gradient.

The basic steps of updating the graph neural network based on the labeling nodes in the sparse relation network graph are described above. Various embodiments based on the above basic steps are described below.

As previously described, from a randomness perspective, embodiments may include a strictly trained embodiment a and an approximately trained embodiment B.

In embodiment a, in step 253, network parameters of the neural network are updated based on the original values of the first gradient. Assume that in the t-th iteration, the first gradient is g _t Then for the current network parameter θ of the t-th round _t The update of (c) can be expressed as:

θ _t+1 ＝θ _t -η _t g _t (4)

wherein eta _t The learning step length or learning rate is shown and is a preset super parameter; θ _t+1 Representing updated network parameters obtained through the training of the t-th round.

In embodiment B, in step 253, noise is added to the first gradient by using a noise mechanism of differential privacy to obtain a first noise gradient; and then updating parameters of the graph neural network according to the first noise gradient.

The noise mechanism of differential privacy is mainly realized by adding noise to the query result. The noise may be embodied as laplace noise, gaussian noise, or the like. According to one embodiment, in this step 253, differential privacy is achieved by adding Gaussian noise to the gradient. More specifically, the first gradient may be first trimmed based on a preset trimming threshold to obtain a trimming gradient; then, utilizing the Gaussian distribution determined based on the clipping threshold value to determine Gaussian noise for realizing differential privacy, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold value; then, the gaussian noise and the clipping gradient may be superimposed to obtain a first noise gradient.

More specifically, as an example, assume that in the t-th iteration, the resulting first gradient is g _t . To add Gaussian noise to it can be based onAnd carrying out gradient clipping on the original gradient by a preset clipping threshold value to obtain a clipping gradient, determining Gaussian noise for realizing differential privacy based on the clipping threshold value and a preset noise scaling coefficient (preset super-parameter), and then fusing (e.g. summing) the clipping gradient and the Gaussian noise to obtain a gradient containing noise. It will be appreciated that this way, on the one hand, the original gradient is clipped, and on the other hand, the clipped gradients are superimposed, so that the gradient is subjected to differential privacy processing that satisfies gaussian noise.

For example, the original gradient g _t The gradient cutting is carried out as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,represents the gradient after clipping, C represents the clipping threshold, g _t ‖ ₂ G represents g _t Is a second order norm of (c). That is, in the case where the gradient is less than or equal to the clipping threshold C, the original gradient is retained, and in the case where the gradient is greater than the clipping threshold C, the original gradient is clipped to the corresponding size in proportion to being greater than the clipping threshold C.

Adding Gaussian noise to the clipped gradient to obtain a gradient containing noise, for example, the following steps:

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing a gradient containing noise; />Representing probability density compliance with 0 as mean, sigma ² C ² I is Gaussian noise of a Gaussian distribution of variance; sigma represents the above noise scalingThe coefficient is a preset super parameter and can be set as required; c is the clipping threshold; i represents an indicator function that may take either 0 or 1, e.g., an even round may be set to take 1 and an odd round to take 0 in multiple rounds of training.

Thus, the network parameters of the graph neural network can be adjusted to the following using the gradient after adding gaussian noise, i.e. the first noise gradient, with the goal of minimizing the prediction loss L:

under the condition that gradient added Gaussian noise meets differential privacy, adjustment of network parameters meets the differential privacy. Therefore, a noise mechanism based on differential privacy introduces a certain randomness for updating the graph neural network, and the privacy protection is well balanced among model performances.

On the other hand, from the perspective of whether to combine training, the embodiments may in turn include an embodiment a of separate training and an embodiment b of combined training.

In embodiment a, the multi-layer neural network 10 of FIG. 1 is a pre-trained neural network with a fixed network parameter Θ ₁ . The training method may be to label the matching degree between the nodes in advance as a label and train the multi-layer neural network 10. Accordingly, the embedding vector Ex obtained by graph embedding in step 251 is only the network parameter Θ of the graph neural network 20 ₂ Is a function of (2). Subsequently, it is only necessary to determine for Θ in step 252 ₂ And in step 253, for Θ ₂ And (5) updating parameters.

In embodiment b, the multi-layer neural network 10 and the graph neural network 20 are trained in combination. Network parameters theta of both ₁ And theta (theta) ₂ The graph embedding process through step 251 is associated together. Specifically, for the labeling node x, in step 251, the neighboring nodes in the sparse relation network graph are first obtained as target neighboring nodes, and then the aggregation weight w of each target neighboring node is determined, where the aggregation weight w is based onMatching degree z between labeling node x and each target neighbor node i _x,i And is determined. And then, according to the aggregation weight w, aggregating the node information of each target neighbor node to obtain a node embedding vector Ex of the marked node x. Thus, the node embedding vector Ex is not only dependent on the network parameters Θ of the neural network ₂ Also depends on the degree of matching z by aggregating weights w _x,i Degree of matching z _x,i Is output by the multi-layer neural network 10, is the network parameter Θ ₁ Thus, the node embedded vector Ex is a function of the network parameter Θ ₁ And theta (theta) ₂ A common function. Correspondingly, the predicted loss L determined according to the node embedding vector Ex and the label y is also the network parameter Θ ₁ And theta (theta) ₂ A common function.

In such a case, the determination for the network parameter Θ is made in step 252 ₂ After the first gradient of (a) further propagating the prediction loss (determined according to the node embedding vector Ex and the label) back to the multi-layer neural network 10, determining a second gradient corresponding to the multi-layer neural network 10; then, according to the second gradient, the network parameters Θ in the multi-layer neural network 10 are updated ₁ The joint training of the multi-layer neural network 10 and the graph neural network 20 is achieved.

Above according to the degree of matching z _x,i There are many embodiments for determining the aggregate weight w. In one example, the normalized degree of matching may be used as the aggregate weight. In another example, a first sampling probability when the target neighbor node is sampled may be obtained, the first sampling probability determined using equation (2), and then the aggregate weight w is determined based on the first sampling probability. In yet another example, a second sampling probability for each target neighbor node when sampled may be obtained, the second sampling probability determined based on the degree of matching of each target neighbor node to the labeling node using a Gumbel-softmax function. For example, the second sampling probability may be determined using the aforementioned equation (3). And then determining the aggregation weight w of each target neighbor node according to the second sampling probability. The form of the Gumbel-softmax function is more convenient for gradient derivation, thereby more facilitating gradient propagation from the graph neural network 20 to the multi-layer neural network 10.

The different embodiments of the graph neural network training process are described above from two different perspectives, a randomness perspective and a joint training perspective. Since these two angles are independent of each other, the above embodiments a, B and embodiments a, B can be combined in various ways to obtain more specific examples.

When embodiment B and embodiment B are combined, i.e. a differential privacy mechanism is introduced in the case of joint training, in one example noise may be added on both the first gradient for the graph neural network 20 and the second gradient for the multi-layer neural network 10. Specifically, a differential privacy mode may be utilized to add first noise to the first gradient, so as to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient; in addition, a second noise is added to the second gradient in a differential privacy mode, so that a second noise gradient is obtained; and updating parameters of the multi-layer neural network according to the second noise gradient. The noise adding process may be referred to the foregoing description of embodiment B, and will not be repeated here. In another example, noise may also be added for only one of the first gradient and the second gradient.

Reviewing the above process, a graph neural network is trained based on the sampled sparse relationship network graph in a variety of ways. Because the sparse relation network graph only comprises part of connecting edges sampled from the original relation network graph, the accurate graph structure information in the original relation network graph is difficult to reversely deduce based on the graph neural network trained in the way, and therefore the data privacy of the original relation network graph is protected. And, optionally, a differential privacy mechanism can be introduced in the sampling stage and/or gradient propagation stage, so that a certain randomness is introduced for training the graph neural network. By introducing a differential privacy mechanism, the privacy data security of the original relation network graph is further enhanced on the basis of guaranteeing the basic performance of the graph neural network.

According to another aspect, an apparatus based on a privacy preserving training graph neural network is also provided, which may be deployed in any apparatus, device, platform, cluster of devices with computing and processing capabilities. Fig. 5 shows a schematic block diagram of a training apparatus of the neural network, according to one embodiment.

As shown in fig. 5, the training apparatus 500 includes:

an original graph obtaining unit 51 configured to obtain an original relationship network graph, where the original relationship network graph includes a plurality of nodes, and any first node in the plurality of nodes has a corresponding first neighboring node set;

A matching degree obtaining unit 52, configured to input, for an arbitrary second node in the first neighboring node set, node information of the second node, node information of the first node, connection information of the second node and the first node into a multi-layer neural network, and obtain a matching degree of the second node and the first node;

the sampling unit 53 is configured to sample the first neighboring node set according to the matching degree corresponding to each neighboring node in the first neighboring node set, so as to obtain a sampled neighboring node set of the first node;

a sparse graph forming unit 54 configured to form a sparse relationship network graph based on the sampled neighbor node sets corresponding to the plurality of nodes, respectively;

and a training unit 55 configured to train the graph neural network based on the sparse relationship network graph.

According to one embodiment, the sampling unit 53 is configured to: normalizing the matching degree corresponding to each neighbor node respectively to obtain corresponding matching probability; and sampling each neighbor node according to the matching probability.

According to another embodiment, the sampling unit 53 comprises (not shown):

a first probability determination module configured to determine a first sampling probability that the second node is sampled based on an exponential mechanism of differential privacy, based on a first privacy budget, and a degree of matching of the second node to the first node;

And the neighbor sampling module is configured to sample each neighbor node according to the first sampling probability respectively corresponding to each neighbor node in the first neighbor node set.

Further, in one embodiment, the neighbor sampling module is configured to: and according to the first sampling probability, a preset number of k times of sampling are carried out, and k neighbor nodes are sampled from the first neighbor node set to serve as the sampling neighbor node set.

In another embodiment, the neighbor sampling module is configured to: inputting the first sampling probability corresponding to each neighbor node into a Gumbel-softmax function to obtain the second sampling probability corresponding to each neighbor node; and sampling each neighbor node according to the second sampling probability corresponding to each neighbor node.

Further, the neighbor sampling module may perform a predetermined number k of samplings according to the second sampling probability, and sample k neighbor nodes from the first neighbor node set as the sampling neighbor node set.

According to one embodiment, the sparse relationship network graph includes labeling nodes having labels; the training unit 55 comprises (not shown):

The graph embedding module is configured to utilize the graph neural network to conduct graph embedding on the sparse relation network graph to obtain a node embedding vector of the labeling node;

a first gradient determining module configured to determine a corresponding first gradient of the graph neural network according to the node embedding vector and the label;

and a first updating module configured to update the graph neural network according to the first gradient.

In one embodiment, the first update module is configured to: adding noise on the first gradient by utilizing a noise mechanism of differential privacy to obtain a first noise gradient; and updating parameters of the graph neural network according to the first noise gradient.

Further, in one example, the first update module is specifically configured to: cutting the first gradient based on a preset cutting threshold value to obtain a cutting gradient; determining gaussian noise for achieving differential privacy using a gaussian distribution determined based on the clipping threshold, wherein a variance of the gaussian distribution is positively correlated with a square of the clipping threshold; and superposing the Gaussian noise and the clipping gradient to obtain the first noise gradient.

In one embodiment, the graph embedding module is configured to: acquiring neighbor nodes of the labeling nodes in the sparse relation network graph as target neighbor nodes; determining an aggregation weight of each target neighbor node, wherein the aggregation weight is determined based on the matching degree of the labeling node and each target neighbor node; and according to the aggregation weight, aggregating the node information of each target neighbor node to obtain the node embedded vector of the labeling node.

Further, in one example, the graph embedding module is configured to determine the aggregate weight of each target neighbor node by: acquiring sampling probability of each target neighbor node when the target neighbor node is sampled, wherein the sampling probability is determined based on the matching degree of each target neighbor node and the labeling node by utilizing a Gumbel-softmax function; and determining the aggregation weight of each target neighbor node according to the sampling probability.

According to one embodiment, training unit 55 further comprises:

the second gradient determining module is configured to determine a second gradient corresponding to the multi-layer neural network according to the node embedding vector and the label;

and a second updating module configured to update the multi-layer neural network according to the second gradient.

Further, in one embodiment, the first updating module is configured to: adding first noise on the first gradient by utilizing a differential privacy mode to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient; and, the second update module is configured to: adding second noise on the second gradient by utilizing a differential privacy mode to obtain a second noise gradient; and updating parameters of the multi-layer neural network according to the second noise gradient.

In various embodiments, the plurality of nodes in the original relationship network graph may include at least one of: user nodes, merchant nodes and article nodes.

It should be noted that, the apparatus 500 shown in fig. 5 is an apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the corresponding description in the method embodiment shown in fig. 2 is also applicable to the apparatus 500, which is not repeated herein.

The image neural network obtained through the training of the device 500 can effectively protect the privacy security of original image data.

According to an embodiment of a further aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 2.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing detailed description has further been provided for the purpose of illustrating the technical concept of the present disclosure, and it should be understood that the foregoing detailed description is merely illustrative of the technical concept of the present disclosure, and is not intended to limit the scope of the technical concept of the present disclosure, but any modifications, equivalents, improvements, etc. based on the technical scheme of the embodiments of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims

1. A method for training a graph neural network based on privacy protection, comprising:

the method comprises the steps that an original social relation diagram is obtained, wherein the original social relation diagram comprises a plurality of nodes, each node represents a user, and a connecting edge between the nodes represents social contact behaviors between two corresponding users; any first node in the plurality of nodes is provided with a corresponding first neighbor node set;

based on the sparse relation network graph, a joint training graph neural network and the multi-layer neural network.

2. The method of claim 1, wherein sampling the first set of neighbor nodes according to respective corresponding matching degrees of each neighbor node in the first set of neighbor nodes, comprises:

Normalizing the matching degree corresponding to each neighbor node respectively to obtain corresponding matching probability;

and sampling each neighbor node according to the matching probability.

3. The method of claim 1, wherein sampling the first set of neighbor nodes according to respective corresponding matching degrees of each neighbor node in the first set of neighbor nodes, comprises:

determining a first sampling probability of the second node being sampled according to a first privacy budget and the matching degree of the second node and the first node based on an exponential mechanism of differential privacy;

and sampling each neighbor node according to the first sampling probability respectively corresponding to each neighbor node in the first neighbor node set.

4. The method of claim 3, wherein sampling each neighboring node in the first set of neighboring nodes according to a first sampling probability that each neighboring node corresponds to, respectively, comprises:

and according to the first sampling probability, a preset number of k times of sampling are carried out, and k neighbor nodes are sampled from the first neighbor node set to serve as the sampling neighbor node set.

5. The method of claim 3, wherein sampling each neighboring node in the first set of neighboring nodes according to a first sampling probability that each neighboring node corresponds to, respectively, comprises:

Inputting the first sampling probability corresponding to each neighbor node into a Gumbel-softmax function to obtain the second sampling probability corresponding to each neighbor node;

and sampling each neighbor node according to the second sampling probability corresponding to each neighbor node.

6. The method of claim 5, wherein sampling the respective neighbor nodes according to the respective second sampling probabilities of the respective neighbor nodes, comprises:

and according to the second sampling probability, a preset number of k times of sampling are carried out, and k neighbor nodes are sampled from the first neighbor node set to serve as the sampling neighbor node set.

7. The method of claim 1, wherein the sparse relationship network graph comprises labeled nodes with labels;

the joint training graph neural network and the multi-layer neural network based on the sparse relation network graph comprise the following steps:

performing graph embedding on the sparse relation network graph by using the graph neural network to obtain a node embedding vector of the labeling node;

determining a first gradient corresponding to the graph neural network and a second gradient corresponding to the multi-layer neural network according to the node embedding vector and the label;

Updating the graph neural network according to the first gradient;

updating the multi-layer neural network according to the second gradient.

8. The method of claim 7, wherein updating the graph neural network according to the first gradient comprises:

adding noise on the first gradient by utilizing a noise mechanism of differential privacy to obtain a first noise gradient;

and updating parameters of the graph neural network according to the first noise gradient.

9. The method of claim 8, wherein adding noise to the first gradient using a noise mechanism of differential privacy results in a first noise gradient, comprising:

cutting the first gradient based on a preset cutting threshold value to obtain a cutting gradient;

determining gaussian noise for achieving differential privacy using a gaussian distribution determined based on the clipping threshold, wherein a variance of the gaussian distribution is positively correlated with a square of the clipping threshold;

and superposing the Gaussian noise and the clipping gradient to obtain the first noise gradient.

10. The method of claim 7, wherein performing graph embedding on the sparse relational network graph using the graph neural network to obtain a node embedding vector of the labeling node, comprises:

Acquiring neighbor nodes of the labeling nodes in the sparse relation network graph as target neighbor nodes;

determining an aggregation weight of each target neighbor node, wherein the aggregation weight is determined based on the matching degree of the labeling node and each target neighbor node;

and according to the aggregation weight, aggregating the node information of each target neighbor node to obtain the node embedded vector of the labeling node.

11. The method of claim 10, wherein determining the aggregate weight of each target neighbor node comprises:

acquiring sampling probability of each target neighbor node when the target neighbor node is sampled, wherein the sampling probability is determined based on the matching degree of each target neighbor node and the labeling node by utilizing a Gumbel-softmax function;

and determining the aggregation weight of each target neighbor node according to the sampling probability.

12. The method of claim 7, wherein,

updating the graph neural network according to the first gradient, including: adding first noise on the first gradient by utilizing a differential privacy mode to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient;

Updating the multi-layer neural network according to the second gradient, comprising: adding second noise on the second gradient by utilizing a differential privacy mode to obtain a second noise gradient; and updating parameters of the multi-layer neural network according to the second noise gradient.

13. An apparatus for training a graph neural network based on privacy protection, comprising:

the original graph acquisition unit is configured to acquire an original social relation graph and comprises a plurality of nodes, each node represents one user, and a connecting edge between the nodes represents social connection behaviors between two corresponding users; any first node in the plurality of nodes is provided with a corresponding first neighbor node set;

and the training unit is configured to jointly train the graph neural network and the multi-layer neural network based on the sparse relation network graph.

14. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-12.