WO2021082681A1

WO2021082681A1 - Method and device for multi-party joint training of graph neural network

Info

Publication number: WO2021082681A1
Application number: PCT/CN2020/111501
Authority: WO
Inventors: 陈超超; 郑龙飞; 王力; 周俊
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2019-10-29
Filing date: 2020-08-26
Publication date: 2021-05-06
Also published as: CN110782044A

Abstract

A method and a device for multi-party joint training of a graph neural network. A plurality of parties comprises a plurality of data holders and a server; a graph neural network includes a graph embedding sub-network and a classification sub-network. Each data holder respectively maintains a part of the graph embedding sub-network, and the server maintains the classification sub-network. Any of the data holders, in the graph embedding sub-network maintained thereby, computes jointly with other holders, by means of secure multi-party computation (MPC), a primary embedding vector of a sample, performs multi-level neighbor aggregation on a node according to a local graph structure to obtain a high-order embedding vector of the node, and sends same to the server. The server combines the high-order embedding vectors from the data holders by using the classification sub-network, and performs classification prediction accordingly, to determine loss. A loss gradient is passed back from the classification sub-network in the server to the graph embedding sub-network in the data holders, realizing the joint training of the whole graph neural network. The present invention protects the data privacy of the parties.

Description

Method and device for multi-party joint training graph neural network

Technical field

One or more embodiments of this specification relate to the fields of data security and machine learning, and in particular, to methods and devices for multi-party joint training of graph neural networks.

Background technique

The data needed for machine learning often involves multiple fields. For example, in a user classification analysis scenario based on machine learning, the electronic payment platform owns the user's transaction flow data, the social platform owns the user's friend contact data, and the banking institution owns the user's loan data. Data often exists in the form of islands. Due to industry competition, data security, user privacy and other issues, data integration is facing great resistance. It is difficult to integrate data scattered on various platforms to train machine learning models. Under the premise of ensuring that data is not leaked, the use of multi-party data to jointly train machine learning models has become a major challenge at present.

Graph neural network is a widely used machine learning model. Compared with the traditional neural network, the graph neural network can not only capture the characteristics of nodes, but also describe the characteristics of the relationship between nodes. Therefore, it has achieved excellent results in a number of machine learning tasks. However, this also makes graph neural networks have a certain complexity. In particular, when faced with the phenomenon of data islands, how to integrate multi-party data and safely conduct multi-party joint modeling has become a problem to be solved.

Therefore, it is hoped that there will be an improved scheme that can safely and effectively train graph neural networks among multiple parties.

Summary of the invention

One or more embodiments of this specification describe methods and devices for multi-party joint training of graph neural networks, which can safely and efficiently jointly train graph neural networks among multiple parties as a prediction model.

According to a first aspect, there is provided a method for jointly training a graph neural network by multiple parties. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multiple parties include a server and N data holders. The server maintains In the classification sub-network, each of the N data holders maintains a part of the graph embedding sub-network; any first holder of the N data holders stores the first of each sample in the sample set Characteristic part, and a first graph structure containing the respective samples as corresponding nodes; the first holder maintains the first network part of the graph embedding sub-network, and the first network part includes an embedding layer and an aggregation layer The method is executed by the first holder, and includes: in the embedding layer, based at least on the first characteristic part of each sample, using a multi-party secure computing scheme to combine with other N-1 data holders Calculate the primary embedding vector of each sample; in the aggregation layer, based on the first graph structure and the primary embedding vector of each sample, perform multi-level aggregation on each sample to determine the high-level of each sample Embedding vector; where each level of aggregation includes, for each sample corresponding to the node in the first graph structure, at least based on the previous level of the node’s neighboring node’s embedding vector, determining the node’s current level of embedding vector; The high-order embedding vector of each sample is sent to the server, so that the server uses the classification sub-network to classify and predict each sample based on the high-order embedding vector sent by the N data holders to obtain a classification prediction Result; receiving a loss gradient from the server, the loss gradient being determined based on at least the classification prediction result of each sample and the sample label; updating the first network part according to the loss gradient.

According to an embodiment, the multi-party secure computing scheme includes a secret sharing scheme; accordingly, the primary embedding vector of each sample can be obtained in the following manner: the first characteristic part of each sample is shared to obtain the first shared characteristic part ; Send the first shared characteristic part to other N-1 data holders, and receive N-1 shared characteristic parts from the other N-1 data holders; for the first characteristic Part and the N-1 shared characteristic parts are integrated to obtain the first comprehensive characteristic; the first comprehensive characteristic is sent to other N-1 data holders, and from the other N-1 data holders N-1 comprehensive features are received respectively; and the primary embedding vector of each sample is determined according to the first comprehensive feature and the N-1 comprehensive features.

In one embodiment, the embedding layer has embedding parameters, and obtaining the primary embedding vector of each sample at the embedding layer includes, based on the first feature part of each sample, and the embedding parameters in the embedding layer, using a multi-party secure computing scheme, and The other N-1 data holders jointly calculate the primary embedding vector of each sample.

In this case, updating the first network part includes updating the embedded parameters.

In a further embodiment, the embedding layer adopts a secret sharing scheme to jointly calculate the primary embedding vector of each sample with other N-1 data holders, which specifically includes: sharing the first feature part of each sample , Obtain the first shared characteristic part; perform sharing processing on the embedded parameter to obtain the first shared parameter part; send the first shared characteristic part and the first shared parameter part to other N-1 data holders , And respectively receive N-1 shared characteristic parts and N-1 shared parameter parts from other N-1 data holders; the first synthesis composed of the embedded parameters and the N-1 shared parameter parts Parameters, processing the first integrated feature composed of the first feature part and the N-1 shared feature parts to obtain a first integrated embedding result; sending the first integrated embedding result to the other N-1 Data holders, and receive corresponding N-1 integrated embedding results from the other N-1 data holders; according to the first integrated embedding result and the N-1 integrated embedding results, determine The primary embedding vector of each sample.

In one embodiment, each level of aggregation in the aggregation layer includes, for any first sample in each sample, the corresponding first node in the first graph structure: at least according to the upper limit of the neighbor node of the first node. The first-level embedding vector is used to determine the neighbor aggregation vector; and the current-level embedding vector of the first node is determined according to the neighbor aggregation vector and the upper-level embedding vector of the first node.

Further, in an example, a pooling operation is performed on the upper-level embedding vector of the neighbor node of the first node to obtain the neighbor aggregation vector.

In another example, the weighted summation of the upper-level embedding vectors of the neighbor nodes of the first node obtains the neighbor aggregation vector, and the weight corresponding to each neighbor node is based on the relationship between the neighbor node and the first node. The characteristics of the connecting edge are determined.

In another example, the neighbor aggregation vector is determined based on the upper-level embedding vector of each neighbor node and the edge embedding vector of each connection edge between each neighbor node and the first node.

According to an embodiment, the process of updating the first network part includes: according to the loss gradient, using a backpropagation algorithm to reversely update the aggregation parameters in the aggregation layer and the embedding parameters in the embedding layer layer by layer.

According to a second aspect, there is provided a method for jointly training a graph neural network by multiple parties. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multiple parties include a server and N data holders. The server maintains In the classification sub-network, each of the N data holders maintains a part of the graph embedding sub-network; each of the N data holders stores part of the characteristics of each sample in the sample set, and A graph structure containing each sample as a corresponding node; the method is executed by the server, and the method includes: for any target sample in the sample set, respectively receiving the target sample from the N data holders The N high-order embedding vectors of the sample, where the i-th high-order embedding vector is obtained by the i-th holder of the N data holders, by combining the stored graph structure and the characteristic part of the target sample , Input the graph embedding sub-network part maintained therein to obtain; in the classification sub-network, the N high-order embedding vectors are integrated to obtain the integrated embedding vector of the target sample, and the integrated embedding vector is obtained according to the integrated embedding vector Determine the classification prediction result of the target sample; determine the prediction loss based at least on the classification prediction result of the target sample and the corresponding sample label; update the classification sub-network according to the prediction loss, and determine the classification sub-network input The loss gradient corresponding to the layer; the loss gradient is sent to the N data holders, so that each holder updates the graph embedded sub-network part.

In different embodiments, the N high-order embedding vectors can be synthesized to obtain the integrated embedding vector of the target sample in the following manner: the N high-order embedding vectors are spliced to obtain the integrated embedding vector; or , Averaging the N high-order embedding vectors to obtain the comprehensive embedding vector.

In one embodiment, synthesizing the N high-order embedding vectors to obtain the comprehensive embedding vector of the target sample includes: using N weight vectors to perform bitwise multiplication with the N high-order embedding vectors to obtain N weighted processing vectors; sum the N weighted processing vectors to obtain the integrated embedding vector; wherein, updating the classification sub-network includes updating the N weight vectors.

According to an embodiment, before determining the predicted loss, the method further includes: receiving the sample label from a second holder of the N data holders.

According to a third aspect, there is provided an apparatus for joint training of a graph neural network by multiple parties. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multiple parties include a server and N data holders. The server maintains the classification sub-network, each of the N data holders maintains a part of the graph embedding sub-network; any first holder of the N data holders stores the data of each sample in the sample set The first characteristic part, and the first graph structure containing the respective samples as the corresponding nodes; the first holder maintains the first network part of the graph embedding sub-network, and the first network part includes the embedding layer and Aggregation layer; the device is deployed in the first holder and includes: a primary embedding unit, configured to use a multi-party secure computing solution at the embedding layer, based at least on the first characteristic part of each sample, with other N -1 data holders jointly calculate the primary embedding vector of each sample; the aggregation unit is configured to perform, at the aggregation layer, based on the first graph structure and the primary embedding vector of each sample, The samples perform multi-level aggregation to determine the high-level embedding vector of each sample; each level of aggregation includes, for each sample corresponding to the node in the first graph structure, at least based on the previous level embedding vector of the node’s neighbor node , Determine the embedding vector of the node at this level; the sending unit is configured to send the high-level embedding vector of each sample to the server, so that the server uses the classification sub-network to hold the N data The high-order embedding vector sent by Youfang performs classification prediction on each sample to obtain a classification prediction result; the receiving unit is configured to receive a loss gradient from the server, and the loss gradient is based on at least the classification prediction result and sample label of each sample And it is determined; the update unit is configured to update the first network part according to the loss gradient.

According to a fourth aspect, there is provided an apparatus for joint training of a graph neural network by multiple parties. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multiple parties include a server and N data holders. The server maintains the classification sub-network, each of the N data holders maintains a part of the graph embedding sub-network; each of the N data holders stores partial characteristics of each sample in the sample set , And a graph structure containing each sample as a corresponding node; the device is deployed in the server and includes: a vector receiving unit configured to hold any target sample in the sample set from the N data A party receives N high-order embedding vectors for the target sample, where the i-th high-order embedding vector is generated by the i-th holder of the N data holders, and the graph structure stored therein And the characteristic part of the target sample, which is obtained by inputting the maintained graph embedding sub-network part; the classification prediction unit is configured to integrate the N high-order embedding vectors in the classification sub-network to obtain the target sample And determine the classification prediction result of the target sample according to the comprehensive embedding vector; a loss determination unit configured to determine the prediction loss based at least on the classification prediction result of the target sample and the corresponding sample label; update unit , Configured to update the classification sub-network according to the predicted loss, and determine the loss gradient corresponding to the input layer of the classification sub-network; a sending unit, configured to send the loss gradient to the N data holders, This allows each holder to update the graph embedded in the sub-network part.

According to a fifth aspect, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the second aspect.

According to a sixth aspect, there is provided a computing device, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the method of the first aspect or the second aspect is implemented .

According to the method and device provided by the embodiments of this specification, multiple data holders and servers jointly train a graph neural network, wherein each data holder stores part of the characteristics of the sample and the graph structure with the sample as the node. The graph neural network is divided into a graph embedding sub-network and a classification sub-network. Each data holder maintains a part of the graph embedding sub-network, and the server maintains the classification sub-network. In the graph embedding sub-network maintained by any data holder, the primary embedding vector of the sample is calculated jointly with other holders through a multi-party secure calculation scheme. On this basis, the node is multiplied according to the local graph structure. The high-level neighbors aggregate to obtain the high-order embedding vector of the node and send it to the server. The server uses the classification sub-network to synthesize the high-level embedding vectors of the samples from each data holder, and then classifies and predicts the samples accordingly to determine the loss. Finally, the loss gradient is passed from the classification sub-network in the server back to the graph embedding sub-network in the data holder to realize the joint training of the entire graph neural network. In the whole process, the privacy and security of the sample feature data and graph structure data are guaranteed, and the calculation and training efficiency of the entire network is also improved.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification;

Fig. 2 shows a process of a multi-party joint training graph neural network method according to an embodiment;

Figure 3 shows the flow of the method for the first holder to determine the primary embedding vector of the sample in a secret sharing manner;

Fig. 4 shows a schematic block diagram of a training device deployed in a first holder according to an embodiment;

Fig. 5 shows a schematic block diagram of a training device deployed in a server according to an embodiment.

Detailed ways

The following describes the solutions provided in this specification with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification. In Figure 1, for clarity and simplicity, two data holders are shown, namely holder A and holder B. Holders A and B each store part of the characteristics of the sample, and a graph structure that records the relationship between the samples. In a specific example, the sample may be a user. Correspondingly, the holder A may be, for example, an electronic payment platform (such as Alipay), in which a part of the user's characteristics (such as payment-related characteristics) are stored. This part of the features is shown in FIG. 1 as features f1 to f4. In addition, the holder A also stores a graph structure A constructed through a payment relationship, for example. More specifically, users who have a payment or transfer relationship in the electronic payment platform can be connected by connecting edges, thereby forming a graph structure A. On the other hand, the holder B may be, for example, a social platform (such as Dingding), in which another part of the user's characteristics (such as social-related characteristics) are stored. These features are shown in FIG. 1 as features f5, f6, and f7. In addition, the holder B also stores a graph structure B constructed through social relationships, for example. More specifically, in the social platform, users who have a friend relationship or have a history of communication can be connected by connecting edges, thereby forming a graph structure B.

It can be seen from the illustration in Figure 1 that holder A and holder B can store different characteristic parts of the same sample, and because holder A and holder B perform graph structure based on different association relationships Construction (for example, the holder A is based on the payment relationship, and the holder B is based on the social friend relationship). Therefore, the holders A and B each store different graph structures.

In order to improve the accuracy of prediction, it is hoped to perform machine learning based on richer sample information. The data holder A and the holder B each only save part of the characteristics of the sample and the graph structure constructed based on a certain relationship. Therefore, it is hoped that the feature data and graph data stored in the data holders A and B can be saved Taken together, a graph neural network is jointly trained as a predictive model. At the same time, it is hoped that in the process of joint training, the original data in data holders A and B will not be leaked out to ensure privacy and security.

For this reason, in an embodiment of this specification, a neutral server is introduced in addition to each data holder, and each data holder and server jointly train the graph neural network. In order to take into account both the privacy and security of the data and the computational efficiency, the graph neural network to be trained is divided into two parts: the graph embedding sub-network and the classification sub-network.

The graph embedding sub-network is used to generate the high-order embedding vectors of the nodes corresponding to each sample according to the sample characteristics and graph structure. The calculation of the graph embedding sub-network involves original sample features and graph structure data, which is related to privacy data calculations, so it can be performed locally on the data holder. Specifically, each data holder may maintain a part of the graph embedding sub-network, and use the locally maintained graph embedding sub-network part based on locally stored feature data and graph structure data to calculate the high-order embedding vector of the node.

More specifically, the graph embedding sub-network in each data holder can be divided into an embedding layer and an aggregation layer. At the embedding layer, each data holder uses a multi-party secure calculation scheme to synthesize the stored sample feature parts to obtain the primary embedding vector of the node corresponding to each sample. In the aggregation layer, the data holder performs multi-level neighbor aggregation on the node based on the node's primary embedding vector and the locally stored graph structure, thereby obtaining the node's high-order embedding vector.

The classification sub-network is used to synthesize the high-order embedding vectors of the nodes obtained by embedding the graph into the sub-network, and perform classification prediction on the nodes according to the comprehensive results. The calculation of the classification sub-network does not involve the original sample features and graph structure data, and is a non-privacy related calculation. Therefore, it can be performed in the server to improve the efficiency of calculation and training.

During the training process, the server can determine the prediction loss according to the classification prediction results of the classification sub-network and sample labels (shown as y1, y2 and y3 in Figure 1), and update the classification sub-network through back propagation until it is determined The loss gradient of the input layer of the classification sub-network. Then, the server sends the loss gradient to each data holder, so each data holder can continue to update the graph embedding sub-network according to the loss gradient. As a result, the update and training of the entire graph neural network is realized.

The specific process of multi-party joint training graph neural network is described in detail below.

It should be understood that although only two data providers are shown in FIG. 1, the above concept and architecture can be applied to scenarios of more data providers. For the sake of generality, in the following description, it is assumed that there are N data holders, and N is usually greater than or equal to 2. The N data holders each store a part of the characteristics of each sample in the sample set, and a graph structure containing each sample as a node. In such a scenario, it is hoped that N data holders and servers will jointly train a graph neural network model.

In order to take into account the safety and efficiency of joint training, as described above, the graph neural network is divided into graph embedding sub-networks and classification sub-networks. In this case, the N data holders each maintain a part of the graph embedding sub-network; the server maintains the classification sub-network. For simplicity and clarity of description, the following describes the execution steps of the joint training in conjunction with any one of the N data holders, called the first holder.

Fig. 2 shows a process of a multi-party joint training graph neural network method according to an embodiment. It can be seen that Figure 2 shows the respective processing procedures of any first holder and the server and the interaction procedures between the two in the joint training. Among them, the first holder and the server can be executed by any device, device, platform, or device cluster with computing and processing capabilities.

It can be understood that, as any one of the N data holders, the first holder stores part of the characteristics of each sample in the sample set, which is referred to herein as the first characteristic part. In addition, the graph structure stored in the first holder is referred to as the first graph structure, and the graph embedded sub-network part maintained therein is referred to as the first network part. Further, the first network part includes an embedded layer and an aggregation layer.

Based on the above scenario, the process of joint training includes the following steps.

First, in step 201, the first holder (can be denoted as holder i) uses the embedding layer in the first network part maintained by it, at least based on the first feature part of each sample, and other N-1 pieces of data. The holder jointly calculates the primary embedding vector of each sample. The joint calculation in this step involves the original characteristics of the samples stored in each data holder and belongs to private data. Therefore, for data security considerations, the aforementioned joint calculation needs to adopt a multi-party secure computing (MPC) solution.

There are many existing multi-party secure computing MPC solutions, including homomorphic encryption, obfuscating circuits, secret sharing, and so on. In step 201, the specific algorithm of the embedding layer can be combined with various applicable MPC schemes to jointly calculate the primary embedding vector of the sample.

In one embodiment, the processing of the sample feature by the embedding layer mainly involves encoding and characterizing the original feature data (for example, encoding as a vector), and does not involve parameter calculation processing on the feature data. In this case, various MPC schemes can be used to synthesize the characteristic parts of the sample encoded by the N holders to obtain the primary embedding vector of the sample.

In another embodiment, after the embedding layer encodes and characterizes the feature data of the sample, it further performs calculation processing involving parameters, such as linear transformation of the feature using a parameter vector, or further applying a non-linear function (such as sigmoid function) processing, and so on. In other words, the embedding layer of the holder i includes the embedding parameter θ _i required for feature calculation.

In this case, in an example, a homomorphic encryption method can be used to separately integrate the sample feature parts encoded by the N holders and the embedded parameter parts maintained to obtain the primary embedding vector of the sample. Specifically, homomorphic encryption can be used to synthesize sample features, and homomorphic encryption can also be used to synthesize the embedded parameters maintained in each holder, and then use the integrated embedding parameters to process the synthesized features to obtain the primary embedding vector of the sample. .

In another example, a secret sharing method is adopted to obtain the primary embedding vector of the sample based on each feature part and embedding parameters. Figure 3 shows the flow of the method for the first holder to determine the primary embedding vector of the sample in a secret sharing manner.

Specifically, for a certain target sample, in step 301, the first holder _{i performs sharing processing on the first characteristic part x i of the} sample to obtain the first shared characteristic part x′ _i . The above-mentioned sharing processing can be realized by using an algorithm in secret sharing, by adding a random number generated in a certain manner on the basis of the original data. For example, the first shared feature part can be obtained as follows:

x′ _i = x _i +r _i (1)

Among them, r _i is a random number for the sharing and processing of sample characteristics.

Similarly, the first holder i also _{shares the embedded parameter θ i} therein to obtain the first shared parameter part θ′ _i . Specifically, the sharing process can be performed according to the following formula:

θ′ _i =θ _i +s _i (2)

Among them, s _i is a random number for the sharing of embedded parameters.

Then, in step 302, the first holder _i _{sends the first shared characteristic part x′ i} and the first shared parameter part θ′ i to other N-1 data holders. Similarly, the other N-1 data holders respectively calculate the corresponding shared characteristic part x′ _j (j≠i) and the shared parameter part θ′ _j , and send them out. In this way, the first holder i receives N-1 shared characteristic parts x′ _j and N-1 shared parameter parts θ′ _j from other N-1 data holders, respectively.

Next, at step 303, based on a first holder of the first feature section i x _'i and the N-1 share the characterizing part of x' _j, to obtain a first integrated feature X _i. Specifically, the first comprehensive feature can be obtained according to the following formula:

X _i ＝x _i +∑ _j x′ _j (3)

In addition, the first holder i also obtains the first comprehensive parameter W _i _{similarly based on its own embedded parameter θ i} and N-1 shared parameter parts θ′ _j :

W _i ＝θ _i +∑ _j θ′ _j (4)

Then, the first integrated process parameter W _i of the first integrated feature X _i, to obtain a first integrated nested result H _i.

In step 304, the first holder _i sends the first integrated embedding result Hi to the other N-1 data holders. Other holders similarly get the integrated embedding result H _j . Therefore, the first holder i receives corresponding N-1 integrated embedding results H _j from other N-1 data holders.

Finally, in step 305, the first holder i determines the primary embedding vector H of each sample _{according to the first integrated embedding result H i} and the N-1 integrated embedding results H _{j, for example:}

In this way, by means of secret sharing, for the target sample, the first holder i and each data holder jointly calculate the same primary embedding vector H.

Returning to Figure 2, on the basis of obtaining the primary embedding vector of each sample at the embedding layer of the first holder i using the MPC scheme, then, in step 202, at the aggregation layer, based on the first graph structure stored therein, and each The primary embedding vector of the sample, and the multi-level neighbor aggregation is performed on each sample to determine the high-order embedding vector of each sample. Specifically, each sample corresponds to each node in the first graph structure, and based on the connection information between the nodes in the first graph structure, multi-level neighbor aggregation is performed on each node, where each level of aggregation includes, for each node, Determine the embedding vector of this node at least based on the previous embedding vector of the neighbor node of the node.

Specifically, for the first node v corresponding to any first sample in the first graph structure in each sample, the k-th level aggregation for the first node may include:

Adopt the aggregation function AGG _k , at least according to the upper level (that is, k-1 level) embedding vector of the neighbor node u of the first node v

Determine the neighbor aggregation vector

Where N(v) represents the set of neighbor nodes of node v, namely:

Then, aggregate the vector according to the neighbors

And the upper level (i.e. k-1 level) embedding vector of the first node v

Determine the embedding vector of this level (k level) of the first node v

which is:

Among them, f represents the aggregation vector of neighbors

And the upper level vector of node v

The applied synthesis function, W ^k is the parameter of the k-th level of aggregation. In different embodiments, the integrated operation in the function f can include:

versus

Splicing, or summing, or averaging, etc.

In different embodiments, the above aggregation function AGG _k can take different forms and algorithms.

In one embodiment, the aforementioned aggregation function AGG _k includes a pooling operation. Correspondingly, according to the previous embedding vector of neighbor node u in formula (6)

Determine the neighbor aggregation vector

That means, for each neighbor node u of the first node v, the upper level embedding vector

Perform a pooling operation to get the neighbor aggregation vector

More specifically, the aforementioned pooling operation may include maximum pooling, average pooling, and so on.

In another embodiment, the above-mentioned aggregation function AGG _k can be expressed as embedding the upper-level embedding vector of each neighbor node u

Input the LSTM neural network in turn, and use the hidden vector thus obtained as the neighbor aggregation vector

In yet another embodiment, the aforementioned aggregation function AGG _k includes a weighted sum operation. Correspondingly, formula (6) is embodied as:

That is, the previous embedding vector of the neighbor node u of the first node v

Weighted summation to get the neighbor aggregation vector

Among them, α _uv is the weighting factor.

In an example, the above-mentioned weight factor α _{uv is} determined according to the characteristics _{of the connecting edge e uv} between the neighbor node u and the first node v. For example, when the first graph structure is constructed based on the transfer relationship, _{the characteristics of the connecting edge e uv} between the two nodes u and v may include the total transfer amount of the two users corresponding to the two nodes. When the first graph structure is constructed based on the social relationship, _{the characteristics of the connecting edge e uv} between the two nodes u and v may include the interaction frequency of the two users corresponding to the two nodes. In this way, the _{weight factor of the neighbor node u can be determined based on the characteristics of the connected edge e uv} , and the neighbor aggregation vector can be obtained through the aggregation function of formula (8)

In another embodiment, for the first graph structure, the edge embedding vector of each connected edge is determined according to the edge feature of the connected edge between each node. Correspondingly, in the aggregation function AGG _k , the aggregation of the embedding vectors of the opposite edges is also introduced. Specifically, based on the upper-level embedding vector of each neighbor node u

_{And the edge embedding vector of each connecting edge e uv} between each neighbor node u and the first node v, determine the neighbor aggregation vector

More specifically, in an example, the formula (6) for aggregation _{using the above aggregation function AGG k can be embodied as:}

Among them, q _uv is the edge embedding vector _{of the connecting edge e uv} between the first node v and its neighbor node u.

Above, through the aggregation function AGG _{k of} various forms and algorithms, the neighbor aggregation vector is determined based on the upper-level embedding vector of the neighbor node

Then, according to formula (7), the embedding vector of the first node v at this level is obtained

It can be understood that the initial embedding vector of the sample determined in step 201 can be used as the 0-level embedding vector. Based on this, k is made from 1 to the preset aggregation level K, and the aggregation is performed step by step, and the preset node v can be obtained. High-order embedding vector of series K

Among them, the aggregation level K is a preset hyperparameter, which corresponds to the order of neighbor nodes considered for aggregation.

In this way, according to step 202, the first holder i is in the aggregation layer, and obtains the high-order embedding vector of each sample based on the first graph structure stored therein and the primary embedding vector of each sample.

Next, in step 203, the first holder i sends the high-order embedding vector of each sample to the server.

It can be understood that the first holder i is any one of the N data holders. The other data holder j will also perform similar operations to the first holder i. Based on the structure of the j-th image stored therein and the primary embedding vector of each sample, the high-level embedding vector of each sample will be correspondingly obtained and sent To the server.

Therefore, the server can receive the high-order embedding vector of the sample processed by each of the N data holders. In order to make the presentation clear, the following describes any target sample v. For the target sample v, the server can respectively receive N high-order embedding vectors for the sample v from N data holders

among them,

Represents the high-order embedding vector obtained by the i-th holder for the sample v.

Therefore, in step 204, the server uses the classification sub-network maintained by the server to synthesize the N high-order embedding vectors of the target sample v to obtain the comprehensive embedding vector of the target sample, and determine the classification of the target sample according to the comprehensive embedding vector forecast result.

Specifically, the classification sub-network may include a synthesis layer for synthesizing N high-order embedding vectors of the target sample v. The synthesis layer can adopt many different synthesis methods.

In one embodiment, the N high-order embedding vectors of sample v are

Perform splicing to get a comprehensive embedding vector

In another embodiment, the above-mentioned N high-order embedding vectors are

Take the average to get the integrated embedding vector

In another embodiment, the above-mentioned N high-order embedding vectors are

Weighted summation to get the integrated embedding vector

which is:

Among them, β _i is the weighting factor corresponding to the i-th data holder. The weight factor β _i can be a pre-set hyperparameter, or it can be determined through training.

In another embodiment, in the integrated layer, the integrated embedding vector is obtained in the following way

Among them, ω _i is the weight vector corresponding to the i-th data holder, which has the same dimension as the high-order embedding vector, and ⊙ means bitwise multiplication. That is to say, in formula (11), N weight vectors are used to perform bitwise multiplication with N high-order embedding vectors to obtain N weighted processing vectors, and the N weighted processing vectors are summed to obtain Synthetic embedding vector

It needs to be understood that the above N weight vectors are determined through network training.

After obtaining the above comprehensive embedding vector

After that, the classification sub-network can be based on the integrated embedding vector

Determine the classification prediction result of the target sample. For example, in the classification sub-network, you can continue to embed the integrated vector

For further processing, and then input into the classification layer for classification; or, you can also embed the integrated vector

Enter directly into the classification layer. Through the classification layer, the classification prediction result of the target sample can be obtained.

Next, in step 205, the server determines the prediction loss based on at least the classification prediction result of the target sample and the corresponding sample label.

Generally speaking, the sample label of each sample (for example, in the case of user classification, the identification of the divided user group) comes from the data holder. In an example, one of the N data holders, for example, called the second holder, owns the sample labels of all training samples. In this case, before step 205, the server receives the sample label of each sample from the second holder in advance. In another example, the sample labels of each sample are distributed among different data holders. In this case, before step 205, the server collects sample labels of each sample from each data holder in advance.

In the case of obtaining the sample label, in step 205, the server may determine the prediction loss according to the definition of various loss functions, at least based on the comparison between the classification prediction result of the target sample and the label value of the sample label.

Then, in step 206, the server updates the classification sub-network according to the predicted loss obtained above, and determines the loss gradient corresponding to the input layer of the classification sub-network. Specifically, the loss back propagation method can be used. Starting from the output layer of the classification sub-network, the loss gradient is determined layer by layer, the network parameters of this layer are adjusted based on the loss gradient, and the loss gradient is passed to the upper layer until it is determined The loss gradient corresponding to the input layer.

Then, in step 207, the server sends the aforementioned loss gradient to N data holders. Correspondingly, the first holder i of the N data holders receives the aforementioned loss gradient.

Therefore, in step 208, the first holder i updates the graph embedded sub-network part in it according to the received loss gradient, that is, the aforementioned first network part.

Specifically, the first holder i continues to perform the back propagation of the loss according to the aforementioned loss gradient to update the network parameters therein. Backpropagation is first performed in the aggregation layer, so the aggregation parameters in the aggregation layer can be updated layer by layer in the reverse direction. In the case where the embedding layer involves embedding parameters that need to be trained (for example, as shown in the aforementioned formula (4)), the backpropagation continues, and the embedding parameters in the embedding layer are further updated. In this way, the graph embedding sub-network part of the first holder i is updated.

It can be understood that each of the N data holders can perform the above operations similarly, thereby updating the graph embedded sub-network part maintained therein. As a result, the entire graph embedded sub-network is updated. Furthermore, combined with the classification sub-network in the server, the entire graph neural network has been trained and updated.

Looking back at the joint training process of the graph neural network shown in Figure 2, it can be seen that the forward processing process to obtain the sample prediction results can be divided into three stages. These three stages use three different processing methods and implementation subjects. .

In the first stage, the process of determining the primary embedding vector of the sample is jointly executed by N data holders using the MPC scheme. In this process, the security of the feature data is ensured through the MPC scheme, and the feature data of each holder is thus comprehensively integrated to obtain the primary embedding vector.

In the second stage, the process of determining the high-level embedding vector of the sample is performed by each data holder. Therefore, on the one hand, the security of graph structure data is ensured, and on the other hand, each data holder is allowed to perform multi-level aggregation based on the different graph structures maintained by each.

In the third stage, the process of determining the predicted result and predicted loss of the sample is executed in the server. This is because the processing of high-order embedding vectors does not involve private data, and multiple processing in the neural network involves non-linear transformation, which requires relatively high computational performance. In this way, a neutral server maintains the classification sub-network to improve training and calculation efficiency.

In this way, the multi-party joint training of the graph neural network is efficiently and safely realized through the scheme of the above-mentioned embodiment.

According to another embodiment, a device for multi-party joint training graph neural network is provided, the device is deployed in any first holder of the aforementioned N data holders, and the first holder It can be implemented as any device, platform or device cluster with computing and processing capabilities. As mentioned above, the graph neural network includes a graph embedding sub-network and a classification sub-network. The server maintains the classification sub-network, and the N data holders each maintain a part of the graph embedding sub-network. The first holder stores the first characteristic part of each sample, and the first graph structure containing each sample as the corresponding node; and, the first holder maintains the first network part of the graph embedding sub-network, the first A network part includes an embedded layer and an aggregation layer.

Fig. 4 shows a schematic block diagram of a training device deployed in a first holder according to an embodiment. As shown in FIG. 4, the training device 400 includes the following units.

The primary embedding unit 41 is configured to jointly calculate the primary embedding vector of each sample based on at least the first feature part of each sample at the embedding layer, using a multi-party secure computing scheme, and other N-1 data holders .

The aggregation unit 42 is configured to perform multi-level aggregation on each sample based on the first graph structure and the primary embedding vector of each sample at the aggregation layer to determine the high-order embedding vector of each sample; Each level of aggregation includes, for each sample corresponding to the node in the first graph structure, determining the current level of embedding vector of the node based at least on the previous level of embedding vector of the neighboring node of the node.

The sending unit 43 is configured to send the high-order embedding vectors of the respective samples to the server, so that the server uses the classification sub-network based on the high-order embedding vector pairs sent by the N data holders Each sample is classified and predicted, and the classification prediction result is obtained.

The receiving unit 44 is configured to receive a loss gradient from the server, the loss gradient being determined based on at least the classification prediction result of each sample and the sample label.

The update unit 45 is configured to update the first network part according to the loss gradient.

According to an embodiment, the primary embedding unit 41 is configured to: based on the first characteristic part of each sample and the embedding parameters in the embedding layer, use a multi-party secure computing scheme to be compatible with other N-1 data. The elementary embedding vector of each sample is obtained by a methodical joint calculation; accordingly, the update unit 45 is configured to update the embedding parameter.

In an example of the foregoing implementation manner, the multi-party secure computing scheme adopts a secret sharing scheme, and the primary embedding unit 41 is specifically configured to: perform sharing processing on the first feature part of each sample to obtain the first shared feature Share the embedded parameters to obtain the first shared parameter part; send the first shared characteristic part and the first shared parameter part to other N-1 data holders, and from other N- A data holder receives N-1 shared characteristic parts and N-1 shared parameter parts respectively; using the embedded parameter and the first comprehensive parameter composed of the N-1 shared parameter parts, the processing is performed by the The first integrated feature composed of the first feature part and the N-1 shared feature parts obtains the first integrated embedding result; sending the first integrated embedding result to the other N-1 data holders, And receive corresponding N-1 comprehensive embedding results from the other N-1 data holders; determine the primary level of each sample according to the first comprehensive embedding result and the N-1 comprehensive embedding results Embedding vector.

According to an embodiment, the aggregation unit 42 is configured to, for any first sample in each sample, correspond to the first node in the first graph structure: at least according to the upper level of the neighbor node of the first node. The embedding vector determines the neighbor aggregation vector; according to the neighbor aggregation vector and the upper-level embedding vector of the first node, the current-level embedding vector of the first node is determined.

Further, in an example, the aggregation unit 42 determining the neighbor aggregation vector specifically includes:

Perform a pooling operation on the upper-level embedding vector of the neighbor node of the first node to obtain the neighbor aggregation vector.

In another example, the aggregating unit 42 determining the neighbor aggregation vector specifically includes: a weighted summation of the upper-level embedding vectors of the neighbor nodes of the first node to obtain the neighbor aggregation vector, and the weight corresponding to each neighbor node It is determined according to the characteristics of the connecting edge between the neighbor node and the first node.

In another example, the determination of the neighbor aggregation vector by the aggregation unit 42 specifically includes: based on the upper-level embedding vector of each neighbor node, and the edge embedding vector of each connection edge between each neighbor node and the first node, Determine the neighbor aggregation vector.

According to an embodiment, the update unit 45 is specifically configured to: according to the loss gradient, use a backpropagation algorithm to reversely update the aggregation parameters in the aggregation layer and the embedding parameters in the embedding layer layer by layer.

According to another embodiment, there is provided an apparatus for joint training of a graph neural network by multiple parties, wherein the graph neural network includes a graph embedding sub-network and a classification sub-network, and the multiple parties include a server and N data holders, The server maintains the classification sub-network, each of the N data holders maintains a part of the graph embedding sub-network; each of the N data holders stores partial characteristics of each sample in the sample set , And a graph structure containing the respective samples as corresponding nodes; the device is deployed in the server, and the server can be implemented as any device, platform or device cluster with computing and processing capabilities.

Fig. 5 shows a schematic block diagram of a training device deployed in a server according to an embodiment. As shown in FIG. 5, the training device 500 includes the following units.

The vector receiving unit 51 is configured to respectively receive N high-order embedding vectors for the target sample from the N data holders for any target sample, where the i-th high-order embedding vector is determined by the N The i-th holder of the data holders is obtained by embedding the graph structure stored therein and the characteristic part of the target sample into the graph maintained in the sub-network part.

The classification prediction unit 52 is configured to integrate the N high-order embedding vectors in the classification sub-network to obtain the integrated embedding vector of the target sample, and determine the value of the target sample according to the integrated embedding vector Classification prediction results.

The loss determining unit 53 is configured to determine the prediction loss based at least on the classification prediction result of the target sample and the corresponding sample label.

The updating unit 54 is configured to update the classification sub-network according to the predicted loss, and determine the loss gradient corresponding to the input layer of the classification sub-network.

The sending unit 55 is configured to send the loss gradient to the N data holders, so that each holder updates the graph embedded sub-network part therein.

In one embodiment, the classification prediction unit 52 is specifically configured to: splice the N high-order embedding vectors to obtain the integrated embedding vector; or, averaging the N high-order embedding vectors to obtain the Comprehensive embedding vector.

In another embodiment, the classification prediction unit 52 is specifically configured to: use N weight vectors to perform bitwise multiplication with the N high-order embedding vectors to obtain N weighted processing vectors; The processing vectors are summed to obtain the integrated embedding vector; accordingly, the update unit 54 is configured to update the N weight vectors.

According to an embodiment, the device 500 further includes (not shown) a tag receiving unit configured to receive the sample tag from a second holder of the N data holders.

According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.

According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 2 method.

Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.

The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims

A method for multi-party joint training of a graph neural network. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multi-party includes a server and N data holders. The server maintains the classification sub-network. The N data holders each maintain a part of the graph embedding sub-network; any first holder of the N data holders stores the first characteristic part of each sample in the sample set, and contains the Each sample serves as the first graph structure of the corresponding node; the first holder maintains the first network part of the graph embedding sub-network, and the first network part includes the embedding layer and the aggregation layer; Execution by the holder, including:

In the embedding layer, based on at least the first feature part of each sample, a multi-party secure computing MPC scheme is used to jointly calculate with other N-1 data holders to obtain the primary embedding vector of each sample;

At the aggregation layer, based on the first graph structure and the primary embedding vector of each sample, multi-level aggregation is performed on each sample to determine the high-order embedding vector of each sample; wherein each level of aggregation includes, For the node corresponding to each sample in the first graph structure, determine the current-level embedding vector of the node based at least on the previous-level embedding vector of the neighbor node of the node;

Sending the high-order embedding vectors of the respective samples to the server, so that the server uses the classification sub-network to classify and predict each sample based on the high-order embedding vectors sent by the N data holders, Obtain classification prediction results;

Receiving a loss gradient from the server, the loss gradient being determined based on at least a classification prediction result of each sample and a sample label;

According to the loss gradient, the first network part is updated.
The method according to claim 1, wherein, based on at least the first characteristic part of each sample, a multi-party secure computing MPC scheme is used to jointly calculate with other N-1 data holders to obtain the primary embedding vector of each sample, include:

Based on the first feature part of each sample and the embedding parameters in the embedding layer, a multi-party secure computing MPC scheme is used to jointly calculate with other N-1 data holders to obtain the primary embedding vector of each sample;

The updating the first network part includes updating the embedding parameter.
The method according to claim 2, wherein the multi-party secure computing MPC scheme comprises a secret sharing scheme, and the joint calculation with other N-1 data holders to obtain the primary embedding vector of each sample comprises:

Performing sharing processing on the first characteristic portion of each sample to obtain a first sharing characteristic portion; performing sharing processing on the embedded parameter to obtain a first sharing parameter portion;

Send the first shared characteristic part and the first shared parameter part to other N-1 data holders, and receive N-1 shared characteristic parts and N-1 from the other N-1 data holders, respectively Share parameter section;

Using the first integrated parameter composed of the embedded parameter and the N-1 shared parameter parts, the first integrated feature composed of the first feature part and the N-1 shared feature parts is processed to obtain the first integrated parameter. Comprehensive embedding results;

Sending the first comprehensive embedding result to the other N-1 data holders, and receiving corresponding N-1 comprehensive embedding results from the other N-1 data holders;

According to the first comprehensive embedding result and the N-1 comprehensive embedding results, the primary embedding vector of each sample is determined.
The method according to claim 1, wherein the aggregation at each level comprises, for any first sample in each sample, a corresponding first node in the first graph structure:

Determine the neighbor aggregation vector at least according to the upper-level embedding vector of the neighbor node of the first node;

According to the neighbor aggregation vector and the upper-level embedding vector of the first node, the current-level embedding vector of the first node is determined.
The method according to claim 4, wherein said determining a neighbor aggregation vector comprises:

Perform a pooling operation on the upper-level embedding vector of the neighbor node of the first node to obtain the neighbor aggregation vector.
The method according to claim 4, wherein said determining a neighbor aggregation vector comprises:

The weighted summation of the upper-level embedding vectors of the neighbor nodes of the first node obtains the neighbor aggregation vector, and the weight corresponding to each neighbor node is determined according to the characteristics of the connecting edge between the neighbor node and the first node. determine.
The method according to claim 4, wherein said determining a neighbor aggregation vector comprises:

The neighbor aggregation vector is determined based on the upper-level embedding vector of each neighbor node and the edge embedding vector of each connection edge between each neighbor node and the first node.
The method according to claim 1, wherein, according to the loss gradient, updating the first network part comprises:

According to the loss gradient, a back-propagation algorithm is used to reversely update the aggregation parameters in the aggregation layer and the embedding parameters in the embedding layer layer by layer.
A method for multi-party joint training of a graph neural network. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multi-party includes a server and N data holders. The server maintains the classification sub-network. Each of the N data holders maintains a part of the graph embedding sub-network; each of the N data holders stores part of the characteristics of each sample in the sample set, and includes each sample as a corresponding Graph structure of the node; the method is executed by the server, and the method includes:

For any target sample in the sample set, N high-order embedding vectors for the target sample are respectively received from the N data holders, where the i-th high-order embedding vector is held by the N data holders. The i-th holder of a party is obtained by embedding the stored graph structure and the characteristic part of the target sample into the graph maintained in the sub-network part;

In the classification sub-network, the N high-order embedding vectors are synthesized to obtain the integrated embedding vector of the target sample, and the classification prediction result of the target sample is determined according to the integrated embedding vector;

Determine the prediction loss based at least on the classification prediction result of the target sample and the corresponding sample label;

Update the classification sub-network according to the predicted loss, and determine the loss gradient corresponding to the input layer of the classification sub-network;

The loss gradient is sent to the N data holders, so that each holder updates the graph embedded sub-network part.
The method according to claim 9, wherein the integrating the N high-order embedding vectors to obtain the integrated embedding vector of the target sample comprises:

Splicing the N high-order embedding vectors to obtain the comprehensive embedding vector; or,

The N high-order embedding vectors are averaged to obtain the comprehensive embedding vector.
The method according to claim 9, wherein the integrating the N high-order embedding vectors to obtain the integrated embedding vector of the target sample comprises:

Use N weight vectors to perform bitwise multiplication with the N high-order embedding vectors to obtain N weighted processing vectors;

Sum the N weighted processing vectors to obtain the integrated embedding vector;

Wherein, updating the classification sub-network includes updating the N weight vectors.
The method according to claim 9, wherein before determining the predicted loss, the method further comprises:

Receiving the sample label from the second holder of the N data holders.
A device for multi-party joint training of a graph neural network. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multi-party includes a server and N data holders. The server maintains the classification sub-network. , The N data holders each maintain a part of the graph embedding sub-network; any first holder of the N data holders stores the first characteristic part of each sample in the sample set, and includes The respective samples serve as the first graph structure of the corresponding node; the first holder maintains the first network part of the graph embedding sub-network, and the first network part includes the embedding layer and the aggregation layer; the device deployment Among the first holders, include:

The primary embedding unit is configured to, at the embedding layer, at least based on the first characteristic part of each sample, use a multi-party secure computing MPC scheme, and jointly calculate with other N-1 data holders to obtain the primary embedding vector of each sample ；

The aggregation unit is configured to perform multi-level aggregation on each sample based on the first graph structure and the primary embedding vector of each sample at the aggregation layer to determine the high-order embedding vector of each sample; wherein Each level of aggregation includes, for each sample corresponding to the node in the first graph structure, determining the current level of embedding vector of the node based at least on the previous level of embedding vector of the neighboring node of the node;

The sending unit is configured to send the high-order embedding vectors of the respective samples to the server, so that the server uses the classification sub-network to pair each of the high-order embedding vectors based on the high-order embedding vectors sent by the N data holders. The sample is classified and predicted, and the classification prediction result is obtained;

A receiving unit configured to receive a loss gradient from the server, the loss gradient being determined based on at least a classification prediction result of each sample and a sample label;

The update unit is configured to update the first network part according to the loss gradient.
The device according to claim 13, wherein the primary embedding unit is configured to:

Based on the first feature part of each sample and the embedding parameters in the embedding layer, a multi-party secure computing MPC scheme is used to jointly calculate with other N-1 data holders to obtain the primary embedding vector of each sample;

The updating unit is configured to update the embedded parameter.
The apparatus according to claim 14, wherein the multi-party secure computing scheme comprises a secret sharing scheme, and the primary embedding unit is configured to:

Performing sharing processing on the first characteristic portion of each sample to obtain a first sharing characteristic portion; performing sharing processing on the embedded parameter to obtain a first sharing parameter portion;

Send the first shared characteristic part and the first shared parameter part to other N-1 data holders, and receive N-1 shared characteristic parts and N-1 from the other N-1 data holders, respectively Share parameter section;

Using the first integrated parameter composed of the embedded parameter and the N-1 shared parameter parts, the first integrated feature composed of the first feature part and the N-1 shared feature parts is processed to obtain the first integrated parameter. Comprehensive embedding results;

Sending the first comprehensive embedding result to the other N-1 data holders, and receiving corresponding N-1 comprehensive embedding results from the other N-1 data holders;

According to the first comprehensive embedding result and the N-1 comprehensive embedding results, the primary embedding vector of each sample is determined.
The apparatus according to claim 13, wherein the aggregation unit is configured to, for any first sample in each sample, a corresponding first node in the first graph structure:

Determine the neighbor aggregation vector at least according to the upper-level embedding vector of the neighbor node of the first node;

According to the neighbor aggregation vector and the upper-level embedding vector of the first node, the current-level embedding vector of the first node is determined.
The apparatus according to claim 16, wherein said determining a neighbor aggregation vector comprises:

Perform a pooling operation on the upper-level embedding vector of the neighbor node of the first node to obtain the neighbor aggregation vector.
The apparatus according to claim 16, wherein said determining a neighbor aggregation vector comprises:

The weighted summation of the upper-level embedding vectors of the neighbor nodes of the first node obtains the neighbor aggregation vector, and the weight corresponding to each neighbor node is determined according to the characteristics of the connecting edge between the neighbor node and the first node. determine.
The apparatus according to claim 16, wherein said determining a neighbor aggregation vector comprises:

The neighbor aggregation vector is determined based on the upper level embedding vector of each neighbor node and the edge embedding vector of each connecting edge between each neighbor node and the first node.
The device according to claim 13, wherein the update unit is configured to:

According to the loss gradient, a back-propagation algorithm is used to reversely update the aggregation parameters in the aggregation layer and the embedding parameters in the embedding layer layer by layer.
A device for multi-party joint training of a graph neural network. The graph neural network includes a graph embedding sub-network and a classification sub-network. The multi-party includes a server and N data holders. The server maintains the classification sub-network. , Each of the N data holders maintains a part of the graph embedding sub-network; each of the N data holders stores part of the characteristics of each sample in the sample set, and includes each sample As a graph structure of the corresponding node; the device deployed in the server includes:

The vector receiving unit is configured to, for any target sample in the sample set, respectively receive N high-order embedding vectors for the target sample from the N data holders, where the i-th high-order embedding vector is determined by The i-th holder of the N data holders is obtained by embedding the graph structure stored therein and the characteristic part of the target sample into the graph maintained in the sub-network part;

The classification prediction unit is configured to synthesize the N high-order embedding vectors in the classification sub-network to obtain a comprehensive embedding vector of the target sample, and determine the classification of the target sample according to the comprehensive embedding vector forecast result;

A loss determination unit configured to determine a prediction loss based at least on the classification prediction result of the target sample and the corresponding sample label;

An update unit configured to update the classification sub-network according to the predicted loss, and determine the loss gradient corresponding to the input layer of the classification sub-network;

The sending unit is configured to send the loss gradient to the N data holders, so that each holder updates the graph embedded sub-network part.
The apparatus according to claim 21, wherein the classification prediction unit is configured to:

Splicing the N high-order embedding vectors to obtain the comprehensive embedding vector; or,

The N high-order embedding vectors are averaged to obtain the comprehensive embedding vector.
The apparatus according to claim 21, wherein the classification prediction unit is configured to:

Use N weight vectors to perform bitwise multiplication with the N high-order embedding vectors to obtain N weighted processing vectors;

Sum the N weighted processing vectors to obtain the integrated embedding vector;

Wherein, the update unit is configured to update the N weight vectors.
The device according to claim 21, further comprising:

The label receiving unit is configured to receive the sample label from the second holder of the N data holders.
A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of any one of claims 1-12.
A computing device, comprising a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the method described in any one of claims 1-12 is implemented. method.