CN112464292A

CN112464292A - Method and device for training neural network based on privacy protection

Info

Publication number: CN112464292A
Application number: CN202110109491.3A
Authority: CN
Inventors: 熊涛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-03-09
Anticipated expiration: 2041-01-27
Also published as: CN112464292B; CN113536383B; CN113536383A

Abstract

The embodiment of the specification provides a method and a device for training a neural network based on privacy protection. And for any second node in the neighbor node set, inputting the node information of the second node, the node information of the first node and the connection information of the second node and the first node into the multilayer neural network to obtain the matching degree of the second node and the first node. And then, sampling the neighbor node set according to the matching degree corresponding to each neighbor node in the neighbor node set to obtain a sampled neighbor node set of the first node. And then, forming a sparse relation network graph based on the sampling neighbor node sets corresponding to the nodes in the original graph. Then, based on the sparse relationship network graph, a graph neural network is trained.

Description

Method and device for training neural network based on privacy protection

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for training a neural network based on privacy protection.

Background

Relational network diagrams are recently becoming a large core area of machine learning. Data mining and machine learning based on relational network graphs are of increasing value in many areas. For example, the structure of the social network may be understood by predicting potential connections, fraud detection may be performed based on graph structures, consumer behavior of e-commerce users may be understood or real-time recommendations may be made, and so forth.

Meanwhile, people pay more attention to privacy. A large amount of information is hidden in the relational network graph, and various artificial intelligence and machine learning (AI/ML) models using graph information have a great risk of revealing data privacy if the protection is not proper. For example, with the advent of the IOT era, many AI/ML are deployed on-site (cell phones/other IOT devices) to make real-time decisions after being developed with large-scale graph data at the cloud. It is clear from such advantages that since data transmission from the end to the cloud is reduced, privacy of the user is protected, and data transmission cost is also reduced. However, if the model is stolen, the large-scale graph data information used to train the model is also at risk of being leaked.

Therefore, it is desirable to have an improved scheme for more safely and efficiently training a reliable neural network model.

Disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for training a graph neural network based on privacy protection, so that the trained graph neural network can better protect the privacy security of graph data information.

According to a first aspect, there is provided a method for training a neural network based on privacy protection, comprising:

acquiring an original relation network graph, wherein the original relation network graph comprises a plurality of nodes, and any first node in the plurality of nodes is provided with a corresponding first neighbor node set;

for any second node in the first neighbor node set, inputting the node information of the second node, the node information of the first node, and the connection information of the second node and the first node into a multilayer neural network to obtain the matching degree of the second node and the first node;

sampling the first neighbor node set according to the matching degree corresponding to each neighbor node in the first neighbor node set to obtain a sampled neighbor node set of the first node;

forming a sparse relation network graph based on the sampling neighbor node sets corresponding to the plurality of nodes respectively;

and training a neural network of the graph based on the sparse relationship network graph.

In one embodiment, sampling the first neighbor node set according to the matching degree specifically includes: normalizing the matching degrees respectively corresponding to the neighbor nodes to obtain corresponding matching probabilities; and sampling each neighbor node according to the matching probability.

In another embodiment, sampling the first neighbor node set according to the matching degree specifically includes: determining a first sampling probability of the second node being sampled according to a first privacy budget and a matching degree of the second node and the first node based on an exponential mechanism of differential privacy; and sampling each neighbor node according to the first sampling probability corresponding to each neighbor node in the first neighbor node set.

Further, in one example, a predetermined number k of samplings may be performed based on the first sampling probability to sample k neighbor nodes from the first set of neighbor nodes as the set of sampled neighbor nodes.

In yet another embodiment, the first sampling probabilities respectively corresponding to the neighboring nodes may be input into a gummel-softmax function to obtain second sampling probabilities respectively corresponding to the neighboring nodes; and sampling each neighbor node according to the second sampling probability corresponding to each neighbor node.

Further, in one example, a predetermined number k of samplings may be performed according to the second sampling probability, and k neighbor nodes may be sampled from the first neighbor node set as the sampling neighbor node set.

According to one embodiment, a sparse relationship network graph includes labeled nodes having labels; training a neural network based on the sparse relationship network graph, comprising: carrying out graph embedding on the sparse relationship network graph by utilizing the graph neural network to obtain a node embedding vector of the labeling node; determining a corresponding first gradient of the graph neural network according to the node embedding vector and the label; updating the graph neural network according to the first gradient.

In one embodiment, updating the neural network of the map according to the first gradient specifically includes: adding noise on the first gradient by using a noise mechanism of differential privacy to obtain a first noise gradient; and updating the parameters of the graph neural network according to the first noise gradient.

Further, in one example, the first noise gradient is obtained by: based on a preset cutting threshold value, cutting the first gradient to obtain a cutting gradient; determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; and superposing the Gaussian noise and the cutting gradient to obtain the first noise gradient.

In one embodiment, the graph embedding the sparse relationship network graph to obtain the node embedding vector of the labeled node specifically includes: acquiring neighbor nodes of the marked nodes in the sparse relationship network graph as target neighbor nodes; determining an aggregation weight of each target neighbor node, the aggregation weight being determined based on the degree of matching of the annotation node with each target neighbor node; and according to the aggregation weight, aggregating the node information of each target neighbor node to obtain the node embedded vector of the labeled node.

Further, in an example, determining the aggregation weight of each target neighbor node specifically includes: acquiring sampling probability of each target neighbor node when being sampled, wherein the sampling probability is determined based on the matching degree of each target neighbor node and the label node by using a Gumbel-softmax function; and determining the aggregation weight of each target neighbor node according to the sampling probability.

According to one embodiment, training a neural network based on the sparse relationship network graph further comprises: determining a second gradient corresponding to the multilayer neural network according to the node embedding vector and the label; updating the multi-layer neural network according to the second gradient.

Further, a first noise may be added to the first gradient in a differential privacy manner, so as to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient; adding a second noise to the second gradient by using a differential privacy mode to obtain a second noise gradient; and updating the parameters of the multilayer neural network according to the second noise gradient.

In various embodiments, the plurality of nodes in the relational network graph may include at least one of: user nodes, merchant nodes and article nodes.

According to a second aspect, there is provided an apparatus based on a privacy preserving training graph neural network, comprising:

an original graph obtaining unit, configured to obtain an original relationship network graph, where the original relationship network graph includes a plurality of nodes, and any first node in the plurality of nodes has a corresponding first neighbor node set;

a matching degree obtaining unit configured to input node information of a second node, node information of the first node, and connection information of the second node and the first node to a multilayer neural network for any second node in the first neighbor node set, so as to obtain a matching degree between the second node and the first node;

the sampling unit is configured to sample the first neighbor node set according to the matching degree corresponding to each neighbor node in the first neighbor node set, so as to obtain a sampling neighbor node set of the first node;

a sparse graph forming unit configured to form a sparse relationship network graph based on a sampling neighbor node set corresponding to each of the plurality of nodes;

and the training unit is configured to train the neural network of the graph based on the sparse relationship network graph.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

By the method and the device provided by the embodiment of the specification, the neighbor nodes in the original relationship network graph are sampled by utilizing the matching degree between the nodes determined by the multilayer neural network, so that the sparse relationship network graph is obtained. And training a neural network of the graph based on the sampled sparse relationship network graph. Because the sparse relational network graph only contains partial connecting edges sampled from the original relational network graph, the accurate graph structure information in the original relational network graph is difficult to reversely deduce based on the graph neural network trained in the way, and thus the data privacy of the original relational network graph is protected. And optionally, in a sampling stage and/or a gradient propagation stage, a differential privacy mechanism can be introduced, so that certain randomness is introduced for the training of the graph neural network. By introducing a differential privacy mechanism, the privacy data security of the original relationship network diagram is further enhanced on the basis of ensuring the basic performance of the neural network of the diagram.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates an architectural diagram of a training graph neural network in accordance with the concepts of the present technology;

FIG. 2 illustrates a flow diagram of a method for a privacy preserving training graph-based neural network, according to one embodiment;

FIG. 3 illustrates an exemplary presentation of a relational network diagram;

FIG. 4 illustrates a flow of steps to train a neural network based on a sparse relationship network graph in one embodiment;

FIG. 5 shows a schematic block diagram of a training apparatus of the graph neural network according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As previously mentioned, graph neural networks are typically trained based on relational network graphs. A relational network graph is a description of relationships between entities in the real world and may be generally represented as a set of nodes representing entities in the real world and a set of edges representing associations between entities in the real world. For example, in a social network, people are entities and relationships or connections between people are edges. The relational network graph is generally formed by collecting and organizing a large amount of user-related data, and therefore, often contains private data of users. For example, a social network graph may include records of social interactions between users, and the like.

After training the graph neural network based on a particular graph of the relationship network, the graph neural network can be used to predict tasks associated with nodes and/or edges in the graph, e.g., predicting relationships between nodes, predicting classifications of nodes, etc. Because the training is carried out based on the relational network diagram, the neural network of the diagram carries the information of the relational network diagram. Therefore, if the model parameters of the graph neural network are not protected properly, the model parameters are leaked or stolen, and accordingly, the risk of leakage exists in the relational network graph data used for training the graph neural network.

Based on the above consideration, the inventor proposes several embodiments in this specification, in which the original relationship network graph is sampled and thinned, and the graph neural network is trained based on the thinned relationship network graph, so as to achieve the effect of protecting the original graph privacy.

Fig. 1 shows an architectural diagram of a training graph neural network according to the technical concept of the present specification. As shown in fig. 1, an original relationship network diagram 100 is first obtained. The original relationship network graph 100 may be a network graph reflecting various associations. In general, the original relational network graph is a dense graph in which most nodes have a large number of connecting edges, for example, several tens of connecting edges. Nodes connected by connecting edges may be referred to as neighbor nodes. Each node then has a corresponding set of neighbor nodes.

For each node in the original relationship network graph 100, it is input into the multi-layer neural network 10 together with neighboring nodes. The multi-layer neural network 10 is used to predict the degree of match between two nodes of the input. Then, the connecting edges may be sampled according to the matching degree between the nodes where the connecting edges exist, or the neighboring nodes in the neighboring node set may be sampled. Through sampling, only part of the connecting edges are reserved, so that the original relational network graph is thinned, and the sparse relational network graph 200 is obtained.

The sparse relationship network graph 200 may then be input into the graph neural network 20 for training the graph neural network 20. Since the sparse relationship network graph 200 only contains partial information in the original relationship network graph, it is difficult to reversely deduce accurate graph structure information in the original relationship network graph through the graph neural network trained in this way, thereby protecting the data privacy of the original relationship network graph. Moreover, through the thinning, the training speed of the graph neural network can be increased, and the obtained graph neural network has stronger robustness.

Further, a differential privacy mechanism can be introduced in the training process. For example, in the sampling stage, the sampling probability of a certain connection edge can be determined through an exponential mechanism of differential privacy, so that certain randomness is introduced into the sampling process. And certain noise can be introduced into the gradient of back propagation through a noise mechanism of difference privacy in the process of training the neural network of the graph by utilizing the sparse graph, so that certain randomness is introduced for determining the model parameters. By introducing a differential privacy mechanism, on the basis of ensuring the basic performance of the graph neural network, the graph information of the original relationship network graph is difficult to infer based on the graph neural network, so that the security of private data is further protected.

The following describes a specific implementation of the above concept.

FIG. 2 illustrates a flow diagram of a method for a privacy preserving training graph-based neural network, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. The following describes a training process of the privacy-preserving-based graph neural network, with reference to the implementation architecture shown in fig. 1 and the method flow shown in fig. 2.

As shown in fig. 2, first, in step 21, an original relationship network diagram is obtained.

In different embodiments, the original relationship network graph may be a network graph reflecting various associations. For example, in one example, the original relationship network graph is a social relationship graph that contains a number of nodes, each node representing a user; the connecting edges between the nodes represent social connection behaviors between two corresponding users, such as conversation, short message and other social interactions. In another example, the original relationship network graph is an anomaly graph reflecting user behavior habits. In such an anomaly map, a variety of different kinds of nodes may be included, for example, a business node, an article node, in addition to a user node. When a user accesses or purchases an item (e.g., watches a movie, reads a book, etc.), or the user conducts a transaction at a merchant, a connecting edge may be established between the corresponding nodes. Specific examples of the relationship network graph are not exhaustive and are not described here by way of example.

In many scenarios, the number of nodes in the original relationship network graph is huge, for example, the number of nodes in the social relationship graph can reach the order of thousands or even hundreds of millions; the connection relationship between nodes is also complex, and most nodes have a large number of connection edges, for example, dozens or even hundreds of connection edges. Thus, the original relational network graph tends to be a denser graph.

FIG. 3 shows an exemplary presentation of a relational network graph, with the left-hand (a) portion showing an example of an original relational network graph. It can be seen that the number of connecting edges between nodes is huge, and the connection relationship is complex and dense.

For this reason, according to the embodiment of the present specification, the dense original relationship network graph described above is next thinned. For simplicity and clarity of description, the following description is made in conjunction with any one node u in the original relational network graph, which will be referred to as the first node hereinafter. Correspondingly, the nodes connected with the first node through the connecting edge are neighbor nodes of the first node u, and the neighbor nodes form a neighbor node set of the first node u, called a first neighbor node set and marked as N_u. In the case that the relational network graph is a directed graph, the first set of neighbor nodes may be defined as a set of nodes pointing to the first node, or a set of nodes pointed to by the first node, or both, as desired.

Next, at step 22, N is applied to the first set of neighbor nodes above_uInputting the node information N (v) of the second node v and the node information N (u) of the first node, and the connection information A (u, v) of the second node v and the first node u into the multilayer neural network 10 to obtain the matching degree z of the second node and the first node_u,v. The multi-layer neural network 10 may be implemented, for example, as a multi-layer perceptron MLP, a deep feedforward neural network DNN, a multi-layer convolutional neural network CNN, or the like. In case of implementation as a multi-layered perceptron MLP, the above degree of matching z_u,vCan be expressed as:

（1）

note that the above node information n (u) and n (v) is determined according to the attribute of the object represented by the node itself. For example, in the case where the node represents a user, n (u) may contain basic attribute information of the user u, such as sex, age, registration duration, and the like. A (u, v) can be based on the original relationship network graph, the first node uAnd the second node v. For example, in a social relationship graph, a connecting edge corresponds to a social interaction; a (u, v) contains information on the frequency and/or manner of social interaction between user u and user v. In one embodiment, the node information n (u) and n (v), and the connection information a (u, v) are encoded in the form of vectors and input to the multi-layer neural network. Obtaining the matching degree z of the first node u and the second node v through the operation of a multilayer neural network_u,v。

Then, in step 23, according to the first set of neighbor nodes N_uAnd sampling the first neighbor node set according to the matching degree corresponding to each neighbor node in the first node to obtain a sampled neighbor node set of the first node.

In one embodiment, the degree of match z output by the multi-layer neural network_u,vAs such, is a numerical value within the range of (0, 1). In this case, the matching degree corresponding to each neighbor node may be set as the matching probability P_u,vAnd sampling according to the matching probability.

In another embodiment, the matching degree output by the multi-layer neural network is a score of the matching degree with a numerical value not limited to a (0, 1) range. In such a case, the matching degrees respectively corresponding to the neighboring nodes may be normalized to obtain the corresponding matching probability P_u,v(ii) a Then, each neighbor node is sampled according to the matching probability. For example, k times of sampling may be performed according to the matching probability of each node to obtain k neighbor nodes, which are used as a sampling neighbor node set of the first node.

According to one embodiment, when the neighbor nodes are sampled, a privacy protection mechanism of differential privacy is introduced, so that certain randomness is introduced in the sampling process, and the privacy protection effect is enhanced.

In particular, for the first node u and any neighbor nodes thereof, such as the second node v, the index mechanism based on differential privacy can be used to estimate the privacy budget

And the second obtained beforeDegree of matching z between node v and first node u_u,vDetermining the sampling probability that the second node v is sampled

Referred to as the first sampling probability.

In one example, the first sampling probability is

Can be determined by:

（2）

in the above formula (2),

for privacy budgeting, N_uIs a set of neighboring nodes to the first node u,

for sensitivity, the maximum difference in the function value (here, the degree of matching) when a function operation is performed on adjacent data sets is indicated. Under the condition that the matching degree of the output of the multilayer neural network is (-1, 1),

the value is 2. The above equation (2) shows that when sampling is performed according to the first sampling probability, the probability that the second node is sampled, and the matching degree z between the second node and the first node_u,vPositively correlated with the privacy computation

。

By setting N for the first neighbor node set_uAnd each neighbor node performs the operation of the formula to obtain a first sampling probability corresponding to each neighbor node. Thus, a first set of neighbor nodes N may be identified_uThe first sampling corresponding to each neighbor nodeAnd sampling probability, namely sampling each neighbor node.

In one specific example, the maximum number of neighbor nodes k per node in the sparse graph may be set in advance. Accordingly, when the original image sampling process is executed, the arbitrary first node u may be selected according to the first sampling probability

Performing k samples, each sample from a first set N of neighbor nodes_uOne neighbor node is sampled, and thus k neighbor nodes are sampled from the first neighbor node set as a sampled neighbor node set. Of course, in a special case, if the original number of nodes in the first neighbor node set is not greater than k, the original first neighbor node set may be directly used as the sampling neighbor node set. The advantage of this example scheme is that no matter how dense the original relational network graph is, it can be ensured that the number of neighbor nodes of each node in the finally obtained sparse relational graph does not exceed k.

In another specific example, a sampling ratio r of the sparse graph with respect to each node of the original graph may be set in advance, for example, 20%. Correspondingly, when the original image sampling process is executed, aiming at any first node u, firstly, according to the first neighbor node set N of the first node u_uAnd determining the number k of the nodes to be sampled according to the number of the middle nodes and the sampling proportion r. Then, according to the first sampling probability

Performing k samples, each sample from a first set N of neighbor nodes_uOne neighbor node is sampled, and thus k neighbor nodes are sampled from the first neighbor node set as a sampled neighbor node set. By the method, sampling and compression can be performed according to a preset proportion regardless of the number of neighbor nodes of each node in the original relational network graph, and the neighbor node set after sampling of each node is different from the original neighbor node set.

Further, according to one embodiment, to facilitate multi-layer neural networks and figuresAnd when the network is jointly trained, the effective backward propagation of the gradient is adopted, and the sampling probability is determined in a form more beneficial to gradient derivation. Specifically, the first sampling probability is obtained by the above formula (2)

On the basis, the first sampling probability respectively corresponding to each neighbor node can be input into Gumbel-softmax function to obtain the second sampling probability respectively corresponding to each neighbor node

(ii) a Then according to the second sampling probability corresponding to each neighbor node

And sampling each neighbor node.

Specifically, in one example, the first sampling probability may be based on by the following equation (3)

Determining a second sampling probability

：

（3）

In equation (3) above, s is randomly selected from the samples (0, 1).

When neighbor sampling is performed based on the second sampling probability, similarly, a predetermined number k of sampling may be performed, so that k neighbor nodes are sampled from the first set of neighbor nodes as a set of sampled neighbor nodes. Alternatively, sampling is performed based on a predetermined sampling ratio r. And will not be described in detail herein.

By executing the

above steps

22 and 23 on each node in the original relationship network graph, a sampling neighbor node set corresponding to each node can be obtained. Then, in step 24, a sparse relationship network graph may be formed based on the sampled neighboring node sets corresponding to the respective nodes.

The right part (b) of fig. 3 shows a sparse relational network diagram obtained by sampling the original relational network diagram on the left side. Compared with the original relational network graph, the number of the connecting edges in the sparse relational network graph is greatly reduced, and the connecting relation between the nodes is greatly simplified.

Next, in step 25, a neural network of the graph is trained based on the sparse relationship network graph obtained above.

The specific process of training the neural network of the graph can be implemented in various ways. From a randomness perspective, embodiments may include a way of rigorous training based on raw gradients, and a way of approximate training based on differential privacy mechanisms to introduce noise to the gradients; from the perspective of joint training, embodiments can be divided into a mode of training separately with the multilayer neural network and a mode of training together with the multilayer neural network. The embodiments from different angles above, as well as the embodiments combined, are described below in connection with a typical process of training based on labeled nodes.

In a typical graph neural network training, some of the nodes may be labeled with labels corresponding to the predicted tasks. For example, when the prediction task is to predict the transaction risk of the user based on the user social relationship graph, a label may be given to a part of the users in the social relationship graph whose risk status is known, the label showing their real risk status. In different examples, the tags may be category tags (e.g., high risk, medium risk, low risk categories) or numerical tags (e.g., specific risk scores). The labels are associated with subsequent predictive tasks performed based on the graph neural network. The embedded vectors of the labeled nodes are characterized and learned through a graph neural network, and the unlabeled nodes with unknown states can be predicted.

Fig. 4 shows an example of a flow of steps for training a neural network based on a sparse relationship network diagram, namely the sub-steps of step 25 described above, in one embodiment. The training mode of fig. 4 is performed based on the labeled nodes, that is, the sparse relationship network diagram includes labeled nodes x with labels and the nodes have labeled labels y.

As shown in fig. 4, in step 251, the obtained sparse relationship network graph is graph-embedded by using a graph neural network, so as to obtain a node embedding vector Ex of the labeled node x. Different graph neural networks adopt different algorithms to carry out graph embedding to obtain node embedding vectors of all nodes, and some graph neural networks can also obtain edge embedding vectors of edges in the graph. Although the specific algorithm is different, generally, when the graph neural network performs graph embedding, for a target node to be analyzed, information of neighbor nodes of the target node is obtained, and the information of the neighbor nodes is aggregated, so as to determine an embedded vector of the target node. When the label node x is used as a target node, a node embedding vector Ex corresponding to the label node x can be obtained.

Then, at step 252, a corresponding first gradient of the graph neural network is determined based on the node embedding vector Ex and the label y. In general, a prediction result y' can be obtained by performing prediction on a task corresponding to a tag based on the node embedding vector Ex. Then, based on the prediction result y' and the label y, obtaining a prediction loss L according to a preset loss function; then, the predicted loss L is propagated backward in the graph neural network, that is, the partial derivative of the predicted loss with respect to the network parameters of each network layer is obtained from the network layer direction of the graph neural network from the backward direction, and the first gradient corresponding to the graph neural network is obtained.

Next, in step 253, the neural network of the map is updated, i.e. the network parameters therein are updated according to the first gradient.

The basic steps of updating the neural network of the graph based on the labeled nodes in the sparse relationship network graph are described above. Various embodiments based on the above basic steps are described below.

As previously described, from a randomness point of view, embodiments may include a strictly trained embodiment a and an approximately trained embodiment B.

In embodiment a, at step 253, the neural network of the map is updated based on the original values of the first gradientThe network parameter of (2). Suppose that in the t-th iteration, the first gradient obtained is

Then for the current network parameters of the t-th round

The update of (a) may be expressed as:

（4）

wherein the content of the first and second substances,

the learning step length or the learning rate is represented as a preset hyper-parameter;

is shown passing through

And training the obtained updated network parameters in turn.

In embodiment B, in step 253, noise is added to the first gradient by using a noise mechanism of differential privacy, so as to obtain a first noise gradient; parameters of the graph neural network are then updated based on the first noise gradient.

The noise mechanism of differential privacy is mainly realized by adding noise in the query result. The noise may be embodied as laplacian noise, gaussian noise, or the like. In this step 253, differential privacy is achieved by adding gaussian noise in the gradient, according to one embodiment. More specifically, the first gradient may be clipped based on a preset clipping threshold to obtain a clipping gradient; then, determining Gaussian noise for realizing difference privacy by utilizing Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; the gaussian noise and the clipping gradient may then be superimposed to obtain a first noise gradient.

More specifically, as an example, assume that in the t-th iteration, the first gradient obtained is

. In order to add gaussian noise to the above-mentioned difference, gradient clipping may be performed on the original gradient based on a preset clipping threshold to obtain a clipping gradient, gaussian noise for implementing difference privacy is determined based on the clipping threshold and a predetermined noise scaling coefficient (a preset super parameter), and then the clipping gradient is fused (e.g., summed) with the gaussian noise to obtain a gradient including noise. It can be understood that this way, on one hand, performs clipping on the original gradient, and on the other hand, superimposes the clipped gradients, thereby performing differential privacy processing satisfying gaussian noise on the gradient.

For example, the original gradient

The gradient cutting is carried out as follows:

（5）

wherein the content of the first and second substances,

the gradient after the cropping is represented by a graph,

a clipping threshold value is indicated that is,

to represent

The second order norm of (d). That is, at a gradient less than or equal to the clipping threshold

In the case of (2), the original gradient is retained, whereas the gradient is greater than the clipping threshold

In case of (2), the original gradient is made to be greater than the clipping threshold

Is cut to the corresponding size.

Adding gaussian noise to the clipped gradient to obtain a gradient containing noise, for example:

（6）

wherein the content of the first and second substances,

representing gradients containing noise;

indicating that the probability density satisfies the condition of taking 0 as a mean value,

Gaussian noise that is a gaussian distribution of variances;

the noise scaling coefficient is a preset hyper parameter and can be set as required;

the clipping threshold value is set;

indicating the indicator function, 0 or 1 can be taken, for example, it can be set that 1 is taken in the even rounds and 0 is taken in the odd rounds in the multi-round training.

Then, the gradient after gaussian noise addition, i.e. the first noise gradient, can be used to adjust the network parameters of the graph neural network to, with the goal of minimizing the prediction loss L:

（7）

and under the condition that the Gaussian noise added in the gradient meets the differential privacy, the adjustment of the network parameters meets the differential privacy. Therefore, a certain randomness is introduced for updating of the graph neural network based on the noise mechanism of the differential privacy, and a good balance is obtained between model performances in privacy protection.

On the other hand, from the perspective of whether or not to jointly train, embodiments may include embodiment a trained alone and embodiment b trained jointly.

In embodiment a, the multi-layer neural network 10 in FIG. 1 is a pre-trained neural network with fixed network parameters

. The training mode may be to pre-label the matching degree between the nodes as a label to train the multi-layer neural network 10. Accordingly, the embedding vector Ex obtained by the graph embedding in step 251 is only the network parameters of the graph neural network 20

As a function of (c). Subsequently, it is only necessary to determine the target at step 252

And in step 253, to

And updating the parameters.

In embodiment b, the multi-layer neural network 10 and the graph neural network 20 are trained jointly. Network parameters of both

And

the graph embedding process through step 251 is associated together. Specifically, for the labeled node x, in step 251, first, the neighbor nodes in the sparse relationship network graph are obtained as target neighbor nodes, and then the aggregation weight w of each target neighbor node is determined, where the aggregation weight w is based on the matching degree z of the labeled node x and each target neighbor node i_x,iAnd is determined. And then, according to the aggregation weight w, aggregating the node information of each target neighbor node to obtain a node embedded vector Ex of the labeled node x. Thus, the node embedding vector Ex depends not only on the network parameters of the neural network of the graph

Also dependent on the degree of match z by the aggregate weight w_x,iAnd degree of matching z_x,iIs output by the multi-layer neural network 10, is a network parameter

Is a function of, and therefore the node embedding vector Ex is a network parameter

And

a common function. Correspondingly, the prediction loss L determined according to the node embedding vector Ex and the label y is also a network parameter

And

a common function.

In such a case, the parameters for the network are determined in step 252

After the first gradient, the prediction loss (determined according to the node embedding vector Ex and the label) is continuously propagated to the multilayer neural network 10 in a backward direction, and a second gradient corresponding to the multilayer neural network 10 is determined; then, according to the second gradient, the network parameters in the multi-layer neural network 10 are updated

And realizing the joint training of the multilayer neural network 10 and the graph neural network 20.

Above according to the matching degree z_x,iDetermining the aggregation weight w may be implemented in a variety of ways. In one example, the normalized matching degree may be used as an aggregation weight. In another example, a first sampling probability of the target neighbor node when sampled may be obtained, the first sampling probability being determined using equation (2), and then the aggregation weight w may be determined according to the first sampling probability. In yet another example, a second sampling probability of each target neighbor node as sampled may be obtained, the second sampling probability being determined based on a degree of matching of each target neighbor node to the labeled node using a Gumbel-softmax function. For example, the second sampling probability may be determined using the aforementioned equation (3). And then determining the aggregation weight w of each target neighbor node according to the second sampling probability. The form of the Gumbel-softmax function facilitates gradient derivation and thus facilitates gradient propagation from the graphical neural network 20 to the multi-layered neural network 10.

Different embodiments of the graph neural network training process are described above from two different perspectives, a randomness perspective and a joint training perspective. Since these two angles are independent of each other, the above embodiments a, B and embodiments a, B can be combined in various ways to obtain more specific examples.

When embodiment B is combined with embodiment B, i.e. a differential privacy mechanism is introduced in case of joint training, in one example noise may be added on both the first gradient for the graph neural network 20 and the second gradient for the multi-layer neural network 10. Specifically, a first noise may be added to the first gradient in a differential privacy manner to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient; in addition, a second noise is added to the second gradient in a differential privacy mode to obtain a second noise gradient; and updating the parameters of the multilayer neural network according to the second noise gradient. The noise adding process can be referred to the foregoing description of embodiment B, and is not described herein again. In another example, noise may also be added for only one of the first and second gradients.

Reviewing the process, the graph neural network is obtained through training based on the sampled sparse relation network graph in various modes. Because the sparse relational network graph only contains partial connecting edges sampled from the original relational network graph, the accurate graph structure information in the original relational network graph is difficult to reversely deduce based on the graph neural network trained in the way, and thus the data privacy of the original relational network graph is protected. And optionally, in a sampling stage and/or a gradient propagation stage, a differential privacy mechanism can be introduced, so that certain randomness is introduced for the training of the graph neural network. By introducing a differential privacy mechanism, the privacy data security of the original relationship network diagram is further enhanced on the basis of ensuring the basic performance of the neural network of the diagram.

According to another embodiment, an apparatus based on a privacy-preserving training graph neural network is also provided, and the apparatus may be deployed in any apparatus, device, platform, or device cluster having computing and processing capabilities. FIG. 5 shows a schematic block diagram of a training apparatus of the graph neural network according to one embodiment. As shown in fig. 5, the training apparatus 500 includes:

an original graph obtaining unit 51, configured to obtain an original relationship network graph, where the original relationship network graph includes a plurality of nodes, and any first node in the plurality of nodes has a corresponding first neighbor node set;

a matching degree obtaining unit 52, configured to input, to any second node in the first neighboring node set, node information of the second node, node information of the first node, and connection information of the second node and the first node into a multilayer neural network, so as to obtain a matching degree between the second node and the first node;

the sampling unit 53 is configured to sample the first neighbor node set according to the matching degree corresponding to each neighbor node in the first neighbor node set, so as to obtain a sampled neighbor node set of the first node;

a sparse graph forming unit 54 configured to form a sparse relationship network graph based on a sampling neighbor node set corresponding to each of the plurality of nodes;

and the training unit 55 is configured to train a neural network of the graph based on the sparse relationship network graph.

According to an embodiment, the sampling unit 53 is configured to: normalizing the matching degrees respectively corresponding to the neighbor nodes to obtain corresponding matching probabilities; and sampling each neighbor node according to the matching probability.

According to another embodiment, the sampling unit 53 comprises (not shown):

a first probability determination module configured to determine a first sampling probability that the second node is sampled according to a first privacy budget and a matching degree of the second node with the first node based on an exponential mechanism of differential privacy;

and the neighbor sampling module is configured to sample each neighbor node according to the first sampling probability corresponding to each neighbor node in the first neighbor node set.

Further, in one embodiment, the neighbor sampling module is configured to: and executing k times of sampling with a preset number according to the first sampling probability, and sampling k neighbor nodes from the first neighbor node set to serve as the sampling neighbor node set.

In another embodiment, the neighbor sampling module is configured to: inputting the first sampling probability corresponding to each neighbor node into a Gumbel-softmax function to obtain a second sampling probability corresponding to each neighbor node; and sampling each neighbor node according to the second sampling probability corresponding to each neighbor node.

Furthermore, the neighbor sampling module may perform k sampling times according to the second sampling probability, and sample k neighbor nodes from the first neighbor node set as the sampling neighbor node set.

According to one embodiment, the sparse relationship network graph includes labeled nodes with labels; the training unit 55 comprises (not shown):

the graph embedding module is configured to carry out graph embedding on the sparse relationship network graph by utilizing the graph neural network to obtain node embedding vectors of the labeled nodes;

a first gradient determination module configured to determine a corresponding first gradient of the graph neural network from the node embedding vector and the tag;

a first update module configured to update the graph neural network according to the first gradient.

In one embodiment, the first update module is configured to: adding noise on the first gradient by using a noise mechanism of differential privacy to obtain a first noise gradient; and updating the parameters of the graph neural network according to the first noise gradient.

Further, in an example, the first updating module is specifically configured to: based on a preset cutting threshold value, cutting the first gradient to obtain a cutting gradient; determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; and superposing the Gaussian noise and the cutting gradient to obtain the first noise gradient.

In one embodiment, the graph embedding module is configured to: acquiring neighbor nodes of the marked nodes in the sparse relationship network graph as target neighbor nodes; determining an aggregation weight of each target neighbor node, the aggregation weight being determined based on the degree of matching of the annotation node with each target neighbor node; and according to the aggregation weight, aggregating the node information of each target neighbor node to obtain the node embedded vector of the labeled node.

Further, in one example, the graph embedding module is configured to determine the aggregate weight of each target neighbor node by: acquiring sampling probability of each target neighbor node when being sampled, wherein the sampling probability is determined based on the matching degree of each target neighbor node and the label node by using a Gumbel-softmax function; and determining the aggregation weight of each target neighbor node according to the sampling probability.

According to one embodiment, the training unit 55 further comprises:

a second gradient determining module configured to determine a second gradient corresponding to the multilayer neural network according to the node embedding vector and the label;

a second update module configured to update the multi-layer neural network according to the second gradient.

Further, in one embodiment, the first update module is configured to: adding first noise on the first gradient by using a differential privacy mode to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient; and the second update module is configured to: adding second noise on the second gradient by using a differential privacy mode to obtain a second noise gradient; and updating the parameters of the multilayer neural network according to the second noise gradient.

In various embodiments, the plurality of nodes in the original relationship network graph may include at least one of: user nodes, merchant nodes and article nodes.

It should be noted that the apparatus 500 shown in fig. 5 is an apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the corresponding description in the method embodiment shown in fig. 2 is also applicable to the apparatus 500, and is not repeated herein.

The graph neural network obtained by the training of the device 500 can effectively protect the privacy and safety of the original graph data.

According to an embodiment of a further aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims

1. A method for training neural networks based on privacy protection comprises the following steps:

2. The method of claim 1, wherein sampling the first set of neighbor nodes according to the matching degrees corresponding to the respective neighbor nodes in the first set of neighbor nodes comprises:

normalizing the matching degrees respectively corresponding to the neighbor nodes to obtain corresponding matching probabilities;

and sampling each neighbor node according to the matching probability.

3. The method of claim 1, wherein sampling the first set of neighbor nodes according to the matching degrees corresponding to the respective neighbor nodes in the first set of neighbor nodes comprises:

determining a first sampling probability of the second node being sampled according to a first privacy budget and a matching degree of the second node and the first node based on an exponential mechanism of differential privacy;

and sampling each neighbor node according to the first sampling probability corresponding to each neighbor node in the first neighbor node set.

4. The method of claim 3, wherein sampling each neighbor node in the first set of neighbor nodes according to a first sampling probability corresponding to the neighbor node, respectively, comprises:

and executing k times of sampling with a preset number according to the first sampling probability, and sampling k neighbor nodes from the first neighbor node set to serve as the sampling neighbor node set.

5. The method of claim 3, wherein sampling each neighbor node in the first set of neighbor nodes according to a first sampling probability corresponding to the neighbor node, respectively, comprises:

inputting the first sampling probability corresponding to each neighbor node into a Gumbel-softmax function to obtain a second sampling probability corresponding to each neighbor node;

and sampling each neighbor node according to the second sampling probability corresponding to each neighbor node.

6. The method of claim 5, wherein sampling the neighboring nodes according to the second sampling probabilities respectively corresponding to the neighboring nodes comprises:

and executing k times of sampling with preset number according to the second sampling probability, and sampling k neighbor nodes from the first neighbor node set to serve as the sampling neighbor node set.

7. The method of claim 1, wherein the sparse relationship network graph includes labeled nodes with labels;

training a neural network based on the sparse relationship network graph, comprising:

carrying out graph embedding on the sparse relationship network graph by utilizing the graph neural network to obtain a node embedding vector of the labeling node;

determining a corresponding first gradient of the graph neural network according to the node embedding vector and the label;

updating the graph neural network according to the first gradient.

8. The method of claim 7, wherein updating the graph neural network according to the first gradient comprises:

adding noise on the first gradient by using a noise mechanism of differential privacy to obtain a first noise gradient;

and updating the parameters of the graph neural network according to the first noise gradient.

9. The method of claim 8, wherein adding noise to the first gradient using a noise mechanism of differential privacy to obtain a first noise gradient comprises:

based on a preset cutting threshold value, cutting the first gradient to obtain a cutting gradient;

determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;

and superposing the Gaussian noise and the cutting gradient to obtain the first noise gradient.

10. The method of claim 7, wherein graph embedding the sparse relationship network graph using the graph neural network to obtain node embedding vectors of the labeled nodes comprises:

acquiring neighbor nodes of the marked nodes in the sparse relationship network graph as target neighbor nodes;

determining an aggregation weight of each target neighbor node, the aggregation weight being determined based on the degree of matching of the annotation node with each target neighbor node;

and according to the aggregation weight, aggregating the node information of each target neighbor node to obtain the node embedded vector of the labeled node.

11. The method of claim 10, wherein determining an aggregate weight for each target neighbor node comprises:

acquiring sampling probability of each target neighbor node when being sampled, wherein the sampling probability is determined based on the matching degree of each target neighbor node and the label node by using a Gumbel-softmax function;

and determining the aggregation weight of each target neighbor node according to the sampling probability.

12. The method of claim 10, wherein training a neural network of a graph based on the sparse relationship network graph further comprises: determining a second gradient corresponding to the multilayer neural network according to the node embedding vector and the label; updating the multi-layer neural network according to the second gradient.

13. The method of claim 12, wherein,

updating the graph neural network according to the first gradient, including: adding first noise on the first gradient by using a differential privacy mode to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient;

updating the multi-layer neural network according to the second gradient, including: adding second noise on the second gradient by using a differential privacy mode to obtain a second noise gradient; and updating the parameters of the multilayer neural network according to the second noise gradient.

14. The method of claim 1, wherein the plurality of nodes comprises at least one of: user nodes, merchant nodes and article nodes.

15. An apparatus based on a privacy preserving training graph neural network, comprising:

16. The apparatus of claim 15, wherein the sampling unit is configured to:

and sampling each neighbor node according to the matching probability.

17. The apparatus of claim 15, wherein the sampling unit comprises:

18. The apparatus of claim 17, wherein the neighbor sampling module is configured to:

19. The apparatus of claim 17, wherein the neighbor sampling module is configured to:

20. The apparatus of claim 19, wherein the neighbor sampling module is further configured to:

21. The apparatus of claim 15, wherein the sparse relationship network graph comprises labeled nodes with labels;

the training unit includes:

22. The apparatus of claim 21, wherein the first update module is configured to:

23. The apparatus of claim 22, wherein the first update module is configured to:

24. The apparatus of claim 21, wherein the graph embedding module is configured to:

25. The apparatus of claim 24, wherein the graph embedding module is configured to determine the aggregate weight for each target neighbor node by:

26. The apparatus of claim 24, wherein the training unit further comprises:

27. The apparatus of claim 26,

the first update module is configured to: adding first noise on the first gradient by using a differential privacy mode to obtain a first noise gradient; updating parameters of the graph neural network according to the first noise gradient;

the second update module is configured to: adding second noise on the second gradient by using a differential privacy mode to obtain a second noise gradient; and updating the parameters of the multilayer neural network according to the second noise gradient.

28. The apparatus of claim 15, wherein the plurality of nodes comprises at least one of: user nodes, merchant nodes and article nodes.

29. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-14.