WO2021164365A1

WO2021164365A1 - Graph neural network model training method, apparatus and system

Info

Publication number: WO2021164365A1
Application number: PCT/CN2020/132667
Authority: WO
Inventors: 陈超超; 王力; 周俊
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2020-02-17
Filing date: 2020-11-30
Publication date: 2021-08-26
Also published as: CN110929870B; CN110929870A

Abstract

A method and apparatus for training a graph neural network model by means of multiple data owners. In the method, a graph neural network model is divided into a discrimination model and multiple graph neural network sub-models. During model training, each data owner provides its own feature data subsets to its own graph neural network sub-models to obtain feature vector representation of each node. Each data owner receives the discrimination model from a server, and obtains a current predicted label value of each node by using the feature vector representation of each node, whereby obtaining a current loss function at each data owner by calculation, determining gradient information of the discrimination model on the basis of the current loss function, and updating its own graph neural network sub-models. Each data owner provides its own gradient information to the server, so that the server updates the discrimination model. By using method, the data security of private data at each data owner can be ensured.

Description

Graph neural network model training method, device and system

Technical field

The embodiments of this specification generally relate to the field of machine learning, and more particularly to methods, devices, and systems for using horizontally segmented feature data sets to collaboratively train graph neural network models through multiple data owners.

Background technique

The graph neural network model is a machine learning model widely used in the field of machine learning. In many cases, multiple model training participants (for example, e-commerce companies, express companies, and banks) each have different parts of the feature data used by the training graph neural network model. The multiple model training participants usually want to use each other's data to unify the training graph neural network model, but they do not want to provide their data to other model training participants to prevent their own data from being leaked.

In the face of this situation, a graph neural network model training method that can protect the security of private data is proposed. It can cooperate with the multiple model training participants to train the graph while ensuring the security of their respective data of multiple model training participants. The neural network model is used by the participants in the training of the multiple models.

Summary of the invention

In view of the above-mentioned problems, embodiments of this specification provide a method, device and system for collaboratively training graph neural network models via multiple data owners, which can be implemented while ensuring the security of the respective data of multiple data owners. Graph neural network model training.

According to one aspect of the embodiments of the present specification, there is provided a method for training a graph neural network model via multiple data owners. The graph neural network model includes a discriminant model located on the server side and a graph located at each data owner. For neural network sub-models, each data owner has a training sample subset obtained by horizontally splitting a training sample set used for model training. The training sample subset includes a feature data subset and a true label value, so The method is executed by the data owner, and the method includes: executing the following loop process until the loop end condition is satisfied: providing the current feature data subset to the current graph neural network sub-model at the data owner to obtain the all The feature vector representation of each node of the neural network sub-model of the current graph; obtain the current discriminant model from the server; provide the feature vector representation of each node to the current discriminant model to obtain the current predicted label value of each node; The current predicted label value of the node and the corresponding true label value determine the current loss function; when the loop end condition is not satisfied, based on the current loss function, determine the gradient information of the current discriminant model and update the model of the current graph neural network sub-model Parameters; and providing gradient information of the current discriminant model to the server, and the server uses the gradient information of the current discriminant model from each data owner to update the discriminant model at the server, Wherein, when the loop ending condition is not satisfied, the updated graph neural network sub-model of each data owner and the discriminant model at the server end are used as the current model of the next loop process.

Optionally, in an example of the foregoing aspect, the gradient information obtained from each data owner may be provided to the server in a secure aggregation manner.

Optionally, in an example of the foregoing aspect, the secure aggregation may include: secure aggregation based on secret sharing; secure aggregation based on homomorphic encryption; or secure aggregation based on a trusted execution environment.

Optionally, in an example of the foregoing aspect, during each cycle, the method may further include: obtaining a current training sample subset.

Optionally, in an example of the foregoing aspect, the loop end condition may include: a predetermined number of loops; the variation of each model parameter of the discriminant model is not greater than a predetermined threshold; or the current total loss function is within a predetermined range.

Optionally, in an example of the above aspect, the characteristic data may include characteristic data based on image data, voice data, or text data, or the characteristic data may include user characteristic data.

According to another aspect of the embodiments of the present specification, there is provided a method for training a graph neural network model via multiple data owners. The graph neural network model includes a discriminant model located on the server and located at each data owner. The graph neural network sub-model of each data owner has a training sample subset obtained by horizontally splitting the training sample set used for model training, and the training sample subset includes a feature data subset and a true label value , The method is executed by the server, and the method includes: executing the following loop process until the loop end condition is met: providing the current discriminant model to each data owner, and each data owner adding each of the current subgraph neural network models The feature vector representation of the node is provided to the current discriminant model to obtain the predicted label value of each node, and the respective current loss function is determined based on the predicted label value of each node and the corresponding real label value, and when the loop end condition is not satisfied, Based on the respective current loss functions, determine the gradient information of the discriminant model and update the model parameters of the current graph neural network sub-model, and provide the determined gradient information to the server. The feature vector of each node indicates that the current The feature data subset is provided to the current graph neural network sub-model; when the loop end condition is not met, the corresponding gradient information of the current discriminant model is obtained from each data owner, and based on the data from each data owner The current discriminant model is updated with the gradient information of, where, when the cycle end condition is not met, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the next cycle process The current model.

According to another aspect of the embodiments of the present specification, there is provided a method for model prediction using a graph neural network model, the graph neural network model including a discriminant model located on the server side and a graph neural network located at each data owner. Network sub-model, the method is executed by the data owner, and the method includes: providing the data to be predicted to the graph neural network sub-model at the data owner to obtain the information of each node of the graph neural network sub-model The feature vector representation; the discriminant model is obtained from the server; and the feature vector representation of each node is provided to the discriminant model to obtain the predicted label value of each node.

According to another aspect of the embodiments of the present specification, there is provided an apparatus for training a graph neural network model via a plurality of data owners. The graph neural network model includes a discriminant model located on the server and located at each data owner. The graph neural network sub-model of each data owner has a training sample subset obtained by horizontally splitting the training sample set used for model training, and the training sample subset includes a feature data subset and a true label value , The device is applied to the data owner, and the device includes: a vector representation unit that provides a current feature data subset to the current graph neural network sub-model to obtain the feature vector of each node of the current graph neural network sub-model Means; the discriminant model acquisition unit, which acquires the current discriminant model from the server; the model prediction unit, which provides the feature vector representation of each node to the current discriminant model to obtain the current predicted label value of each node; the loss function determination unit, according to The current predicted label value of each node and the corresponding true label value determine the current loss function; the gradient information determination unit, when the loop end condition is not met, determines the gradient information of the current discriminant model based on the current loss function; model update unit , When the loop end condition is not met, update the model parameters of the current graph neural network sub-model based on the current loss function; and the gradient information providing unit provides the gradient information of the current discriminant model to the server, and the service The terminal uses the gradient information of the current discriminant model from each data owner to update the discriminant model at the server, wherein the vector representation unit, the discriminant model acquisition unit, the model prediction unit, and the The loss function determining unit, the gradient information determining unit, the model updating unit, and the gradient information providing unit cyclically operate until the loop end condition is met. When the loop end condition is not met, the updated The graph neural network sub-model of each data owner and the discriminant model of the server are used as the current model of the next cycle process.

Optionally, in an example of the foregoing aspect, the gradient information providing unit may use a secure aggregation method to provide the gradient information obtained from the data owner to the server.

Optionally, in an example of the foregoing aspect, the device may further include: a training sample subset acquiring unit, which acquires a current training sample subset during each cycle operation.

According to another aspect of the embodiments of the present specification, there is provided an apparatus for training a graph neural network model via a plurality of data owners. The graph neural network model includes a discriminant model located on the server and located at each data owner. The graph neural network sub-model of each data owner has a training sample subset obtained by horizontally splitting the training sample set used for model training, and the training sample subset includes a feature data subset and a true label value , The device is applied to the server, and the device includes: a discriminant model providing unit, which provides the current discriminant model to each data owner, and each data owner provides the feature vector representation of each node of the current graph neural network sub-model to The current discriminant model is used to obtain the predicted label value of each node, the respective current loss function is determined based on the predicted label value of each node and the corresponding real label value, and when the loop end condition is not met, each data owner is based on Each current loss function determines the gradient information of the discriminant model and updates the model parameters of the current graph neural network sub-model, and provides the determined gradient information to the server. The feature vector of each node indicates that the current feature is The data subset is obtained by providing the current graph neural network sub-model; a gradient information acquisition unit, when the loop end condition is not met, acquires the corresponding gradient information of the current discriminant model from each data owner; and a discriminant model update unit , Updating the current discriminant model based on gradient information from each data owner, wherein the discriminant model providing unit, the gradient information acquiring unit, and the discriminant model updating unit operate in a loop until the loop end condition is satisfied When the loop ending condition is not satisfied, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the current model of the next loop process.

According to another aspect of the embodiments of the present specification, there is provided a system for training a graph neural network model via a plurality of data owners, including: a plurality of data owner devices, each data owner device including the above And a server device, including the device as described above, wherein the graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner, and each data owner has a pass A training sample subset obtained by horizontally splitting a training sample set used for model training, where the training sample subset includes a feature data subset and a true label value.

According to another aspect of the embodiments of the present specification, there is provided an apparatus for performing model prediction using a graph neural network model. The graph neural network model includes a discriminant model located on the server side and a graph neural network located at each data owner. The network sub-model, the device is applied to the data owner, and the device includes: a vector representation unit that provides the data to be predicted to the graph neural network sub-model at the data owner to obtain the graph neural network sub-model The feature vector representation of each node; the discriminant model acquisition unit, which obtains the discriminant model from the server; and the model prediction unit, which provides the feature vector representation of each node to the discriminant model to obtain the predicted label value of each node.

According to another aspect of the embodiments of the present specification, an electronic device is provided, including: at least one processor, and a memory coupled with the at least one processor, the memory stores instructions, and when the instructions are used by the at least one When the processor executes, the at least one processor is caused to execute the model training method executed on the data owner side as described above.

According to another aspect of the embodiments of the present specification, a machine-readable storage medium is provided, which stores executable instructions that, when executed, cause the at least one processor to execute the above-mentioned The method of model training performed.

According to another aspect of the embodiments of the present specification, an electronic device is provided, including: at least one processor, and a memory coupled with the at least one processor, the memory stores instructions, and when the instructions are used by the at least one When the processor executes, the at least one processor is caused to execute the model training method executed on the server side as described above.

According to another aspect of the embodiments of the present specification, a machine-readable storage medium is provided, which stores executable instructions that, when executed, cause the at least one processor to execute the execution on the server side as described above. The model training method.

According to another aspect of the embodiments of the present specification, an electronic device is provided, including: at least one processor, and a memory coupled with the at least one processor, the memory stores instructions, and when the instructions are used by the at least one When the processor executes, the at least one processor is caused to execute the above-mentioned model prediction method.

According to another aspect of the embodiments of the present specification, a machine-readable storage medium is provided, which stores executable instructions that, when executed, cause the at least one processor to execute the above-mentioned model prediction method.

Using the solution of the embodiment of the present specification, the model parameters of the graph neural network model can be obtained by training without leaking the private data of the multiple training participants.

Description of the drawings

By referring to the following drawings, a further understanding of the nature and advantages of the contents of this specification can be achieved. In the drawings, similar components or features may have the same reference signs.

Fig. 1 shows a schematic diagram of an example of a graph neural network model according to an embodiment of the present specification;

Fig. 2 shows a schematic diagram of an example of a horizontally segmented training sample set according to an embodiment of the present specification;

FIG. 3 shows a schematic diagram showing the architecture of a system for training graph neural network models via multiple data owners according to an embodiment of the present specification;

Fig. 4 shows a flowchart of a method for training a graph neural network model via multiple data owners according to an embodiment of the present specification;

FIG. 5 shows a schematic diagram of an example process for training a graph neural network model via multiple data owners according to an embodiment of the present specification;

Fig. 6 shows a flowchart of a model prediction process based on a graph neural network model according to an embodiment of the present specification;

Fig. 7 shows a block diagram of an apparatus for training a graph neural network model via multiple data owners according to an embodiment of the present specification;

FIG. 8 shows a block diagram of an apparatus for training a graph neural network model via multiple data owners according to an embodiment of the present specification;

Fig. 9 shows a block diagram of an apparatus for model prediction based on a graph neural network model according to an embodiment of the present specification;

FIG. 10 shows a schematic diagram of an electronic device for training a graph neural network model via multiple data owners according to an embodiment of the present specification;

FIG. 11 shows a schematic diagram of an electronic device for training a graph neural network model via multiple data owners according to an embodiment of the present specification; and

Fig. 12 shows a schematic diagram of an electronic device for model prediction based on a graph neural network model according to an embodiment of the present specification.

Detailed ways

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that the discussion of these embodiments is only to enable those skilled in the art to better understand and realize the subject described herein, and is not to limit the scope of protection, applicability, or examples set forth in the claims. The function and arrangement of the discussed elements can be changed without departing from the scope of protection of the contents of this specification. Various examples can omit, substitute, or add various procedures or components as needed. For example, the described method may be executed in a different order from the described order, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples can also be combined in other examples.

As used herein, the term "including" and its variations mean open terms, meaning "including but not limited to". The term "based on" means "based at least in part on." The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment." The terms "first", "second", etc. may refer to different or the same objects. Other definitions can be included below, whether explicit or implicit. Unless clearly indicated in the context, the definition of a term is consistent throughout the specification.

In this specification, the training sample set used in the graph neural network model training scheme is a training sample set that has been horizontally segmented. The term "horizontal segmentation of the training sample set" refers to dividing the training sample set into multiple training sample subsets according to modules/functions (or certain specified rules), and each training sample subset contains a part of training samples. And the training samples included in each training sample subset are complete training samples, that is, all field data and corresponding label values of the training sample are included. In this disclosure, assuming that there are three data owners Alice, Bob, and Charlie, local samples are obtained from each data owner to form a local sample set, and each sample contained in the local sample set is a complete sample , And then, the local sample sets obtained by the three data parties Alice, Bob, and Charlie form the training sample set for graph neural network model training, where each local sample set is used as a training sample subset of the training sample set, with Used to train graph neural network models.

In this specification, each data owner owns different parts of the data of the training samples used in the training of the graph neural network model. For example, taking two data owners as an example, suppose the training sample set includes 100 training samples, and each training sample contains multiple feature values and actual values of labels, then the data owned by the first data owner can be the training sample set The first 30 training samples in the training sample set, and the data owned by the second data owner may be the last 70 training samples in the training sample set.

In the embodiment of this specification, the feature data used in the training of the graph neural network model may include feature data based on image data, voice data, or text data. Correspondingly, the graph neural network model can be applied to business risk identification, business classification or business decision-making based on image data, voice data or text data. Alternatively, the feature data used in the training of the graph neural network model may include user feature data. Correspondingly, the graph neural network model can be applied to business risk identification, business classification, business recommendation or business decision based on user characteristic data.

In addition, in the embodiments of this specification, the data to be predicted used by the graph neural network model may include image data, voice data, or text data. Alternatively, the data to be predicted used by the graph neural network model may include user characteristic data.

In this specification, the terms "graph neural network model" and "graph neural network" can be used interchangeably. The terms "graph neural network sub-model" and "graph neural sub-network" can be used interchangeably. In addition, the terms "data owner" and "training participant" can be used interchangeably.

The method, device, and system for collaborative training of graph neural network models through multiple data owners according to embodiments of the present specification will be described in detail below with reference to the accompanying drawings.

Fig. 1 shows a schematic diagram of an example of a graph neural network model according to an embodiment of the present specification.

As shown in Figure 1, the graph neural network (GNN, Graph Neural Network) model is divided into a discriminant model 10 and multiple graph neural network sub-models 20. For example, the graph neural network sub-models GNN _A , GNN _B and GNN _C. The discriminant model 10 is set on the server 110, and each graph neural network sub-model is set on the corresponding data owner. For example, it can be set on the client of the corresponding data owner, and each data owner has a graph. Neural network sub-model. As shown in FIG. 1, GNN _{A is} set at the data owner A 120-1, GNN _{B is} set at the data owner B 120-2, and GNN _{C is} set at the data owner C 120-3.

The graph neural network sub-model 20 is used to perform GNN calculations on the data of the data owner to obtain the feature vector representation of each node of the graph neural network sub-model. Specifically, when performing GNN calculations, the data of the data owner is provided to the graph neural network sub-model 20. According to the node characteristics and the graph neural sub-network, through the propagation of K-order neighbors, the data of each node corresponding to the current data is obtained. Feature vector representation.

The discriminant model 10 is used to perform model calculation based on the feature vector representation of each node obtained from the data owner to obtain the model prediction value of each node.

In this specification, the data possessed by each data owner is horizontally segmented data. Fig. 2 shows a schematic diagram of an example of horizontally segmented training sample data according to an embodiment of the present specification. In Figure 2, two data parties Alice and Bob are shown, and multiple data parties are similar. Each training sample in the training sample subset owned by each data party Alice and Bob is complete, that is, each training sample includes complete feature data (x) and labeled data (y). For example, Alice has a complete training sample (x0, y0).

FIG. 3 shows a schematic diagram illustrating the architecture of a system for training graph neural network models via multiple data owners (hereinafter referred to as "model training system 300") according to an embodiment of the present specification.

As shown in FIG. 3, the model training system 300 includes a server device 310 and at least one data owner device 320. Three data owner devices 320 are shown in FIG. 3. In other embodiments of this specification, more or fewer data owner devices 320 may be included. The server device 310 and the at least one data owner device 320 may communicate with each other via a network 330 such as but not limited to the Internet or a local area network.

In this specification, the trained graph neural network model (neural network model structure except the discriminant model is removed) is divided into the first number of graph neural network sub-models. Here, the first number is equal to the number of data owner devices participating in model training. Here, assume that the number of data owner devices is N. Correspondingly, the graph neural network model is decomposed into N sub-models, and each data owner device has one sub-model. The feature data set used for model training is located at each data owner device 320. The feature data set is horizontally divided into multiple feature data subsets in the manner described in FIG. 2, and each data owner device has one Feature data subset. Here, the sub-models and corresponding feature data subsets owned by each data owner are the secrets of the data owner and cannot be learned or fully learned by other data owners.

In this specification, multiple data owner devices 320 and server devices 310 use the training sample subsets of each data owner device 320 to collaboratively train the graph neural network model. The specific training process of the model will be described in detail with reference to FIGS. 4 to 5 below.

In this specification, the server device 310 and the data owner device 320 may be any suitable electronic devices with computing capabilities. The electronic devices include, but are not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile electronic devices, smart phones, tablet computers, cellular phones, personal digital assistants (PDAs), handhelds Devices, messaging equipment, wearable electronics, consumer electronics, etc.

FIG. 4 shows a flowchart of a method 400 for training a graph neural network model via multiple data owners according to an embodiment of the present specification.

As shown in Fig. 4, at 401, the graph neural network sub-model at each data owner and the discriminant model of the server are initialized. For example, initialize the graph neural network sub-models GNN _A , GNN _B and GNN _C at the data owners A, B, and C, and initialize the discriminant model 10 on the server side.

Then, the operations from 402 to 410 are executed in a loop until the loop end condition is satisfied.

Specifically, at 402, each data owner device 320 obtains its current training sample subset. For example, the owner of the data to get the current training sample A subset S _A, data B acquires the current owner of the training subset of samples S _B, and C acquires the current owner of the data training sample subset S _C. Each subset of training samples includes a subset of feature data and true label values.

At 403, at each data owner device 320, the obtained current training sample subset is provided to the respective graph neural network sub-model for GNN calculation to obtain the feature vector of each node in the graph neural network sub-model Express. Specifically, when performing GNN calculations, the current training sample subset is provided to the graph neural network sub-model 20. According to the node characteristics and the graph neural sub-network, through the propagation of K-order neighbors, each subset corresponding to the current training sample subset is obtained. The feature vector representation of each node.

At 404, each data owner device 320 obtains the current discrimination model from the server 310. Subsequently, at 405, at each data owner device 320, the current discriminant model is used to perform model prediction based on the feature vector representation of each node to obtain the current predicted label value of each node.

Then, at 406, at each data owner device 320, the current loss function is determined according to the current predicted label value of each node and the corresponding real label value. For example, in one example, the loss function can be based on the formula

Calculated, where i represents the i-th node, P represents the total number of nodes in the graph neural network sub-model, t _i represents the true label value of the i-th node, and O _i represents the current predicted label value of the i-th node.

At 407, at each data owner device 320, based on the current loss function, determine the received gradient information of the current discriminant model, that is, the gradient information of the model parameters of the current discriminant model, for example, determine the received gradient information through backpropagation. The gradient information of the current discriminant model. In addition, at each data owner device 320, based on the current loss function, the model parameters of the current graph neural network sub-model are updated, for example, the model parameters of the current graph neural network sub-model are updated through back propagation.

In 408, each data owner device 320 respectively provides the gradient information of the current discriminant model determined by each to the server 310. In an example, each data owner device 320 may respectively send the gradient information (for example, as it is) of the current discriminant model determined by each to the server 310, and then the server 310 performs the processing on the received gradient information. polymerization. In another example, each data owner may provide the data to the server 310 in a secure aggregation manner. In this specification, the security aggregation may include: security aggregation based on secret sharing; security aggregation based on homomorphic encryption; or security aggregation based on a trusted execution environment. In addition, in this specification, other suitable safe aggregation methods can also be used.

In addition, it should be noted that in this specification, aggregating the received gradient information may include averaging the received gradient information.

In 409, at the server 310, the aggregated gradient information is used to update the discriminant model at the server 310 for subsequent training cycles or as a trained discriminant model.

At 410, it is determined whether the loop end condition is satisfied, that is, whether the predetermined number of loops is reached. If the predetermined number of cycles is reached, the process ends. If the predetermined number of cycles has not been reached, return to the operation of 402 and execute the next training cycle process. Here, the graph neural network sub-model of each data owner updated in the current cycle process and the discriminant model of the server end are used as the current model of the next training cycle process.

It should be explained here that in the above example, the end condition of the training cycle process refers to reaching the predetermined number of cycles. In another example of this specification, the end condition of the training loop process may also be that the variation of each model parameter of the discrimination model 10 is not greater than a predetermined threshold. In this case, the judgment process as to whether the loop process is over is executed in the server 310. In addition, in another example of this specification, the end condition of the training loop process may also be that the current total loss function is within a predetermined range, for example, the current total loss function is not greater than a predetermined threshold. Similarly, the process of determining whether the loop process is over is executed in the server 310. In addition, in this case, each data owner device 320 needs to provide its own loss function to the server 310 for aggregation, so as to obtain the total loss function. In addition, in order to ensure the privacy and security of the loss function of each data owner device 320, in another example of this specification, each data owner device 320 can provide their own loss function to the server 310 by means of secure aggregation. Total loss function. Similarly, the security aggregation for the loss function may also include: the security aggregation based on secret sharing; the security aggregation based on homomorphic encryption; or the security aggregation based on the trusted execution environment.

FIG. 5 shows a schematic diagram of an example process for training a graph neural network model via multiple data owners according to an embodiment of the present specification.

Figure 5 shows three data owners A, B, and C. During model training, in each round of the cycle, the data owners A, B, and C obtain their respective current feature data subsets X _A , X _B, and X _C. The data owners A, B, and C respectively provide the current feature data subsets X _A , X _B, and X _C to their current graph neural network sub-models G _A , G _B, and G _C to obtain each current graph neural network sub-model. The current feature vector representation of each node in the model.

Subsequently, each data owner obtains the current discriminant model H from the server 110. Then, each data owner provides a current discriminant model H for the obtained current feature vector representation of each node to obtain the current predicted label value of each node. Subsequently, at each data owner, based on the current predicted label value of each node and the corresponding real label value, the current loss function is determined, and based on the current loss function, the various model parameters of the current discriminant model are determined through back propagation The gradient information GH. At the same time, at each data owner, based on the current loss function, the model parameters of each layer network model of the current graph neural network sub-model are updated through back propagation.

After each data owner obtains the gradient information GH of each model parameter of the current discriminant model, the respective gradient information is provided to the server by means of safe aggregation. The server updates the current discriminant model based on the obtained aggregated gradient information.

Loop operations in the above manner until the loop end condition is met, thereby completing the graph neural network model training process.

In addition, it should be noted that the model training schemes with 3 data owners are shown in Fig. 3 to Fig. 5. In other examples in the embodiment of this specification, it may also include more or less than 3 data owners. square.

In the traditional GNN model, since the data of multiple data owners cannot be shared with each other, the GNN model is constructed only based on the data of a single data owner. In addition, due to the limited data of a single data owner, the effect of the GNN model is also limited. Using the embodiment of this specification to provide a model training solution can jointly train the GNN model on the basis of protecting the data privacy of each data owner, thereby improving the effect of the GNN model.

In the existing federated learning scheme, all the model parts of the GNN model are arranged on the server, and each data owner (client) learns the model gradient information by using their own private data, and then provides the obtained model gradient information Perform security aggregation on the server, and then update the global model. In this way, the model structure of all data owners must be consistent, so that the server can safely aggregate the model gradient information of each data owner to update the model, so that different models cannot be customized for different clients. However, the sparse quality of the data (features and graph relationships) of different data owners is different, so different GNN models may be needed for learning. For example, the node feature vector representation obtained when the data owner A propagates a 2-degree neighbor is optimal, while the node feature vector representation obtained when the data owner B propagates a 5-degree neighbor is the optimal.

Using the model training method provided by the embodiments of this specification, the GNN model used to obtain the feature vector representation of the node is arranged at each data owner for self-learning (local), and the discriminant model is placed on the server (global). Through multiple data owners to learn together, which can improve the effect of the discriminant model.

In addition, using the graph neural network model training method disclosed in Figures 3 to 5, each data owner provides the gradient information of their current discriminant model to the server through secure aggregation, thereby preventing each data owner The gradient information of the data is completely provided to the server, so as to prevent the server from using the received gradient information to derive the privacy data of the data owner, thereby realizing the privacy data protection for the data owner.

FIG. 6 shows a flowchart of a model prediction process 600 based on a graph neural network model according to an embodiment of the present specification. The graph neural network model used in the model prediction process shown in FIG. 6 is a graph neural network model trained according to the process shown in FIG. 4.

When performing model prediction, at 610, the data to be predicted is provided to the graph neural network sub-model of the data owner to obtain the feature vector representation of each node of the graph neural network sub-model. Next, at 620, the discriminant model is obtained from the server. Then, at 630, the feature vector representation of each node is provided to the received discriminant model to obtain the predicted label value of each node, thereby completing the model prediction process.

FIG. 7 shows a schematic diagram of an apparatus (hereinafter referred to as a model training apparatus) 700 for training a graph neural network model via a plurality of data owners according to an embodiment of the present specification. In this embodiment, the graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. Each data owner has the ability to perform leveling on the training sample set used for model training. A subset of training samples obtained by segmentation, where the subset of training samples includes a subset of feature data and true label values. The model training device 700 is located on the side of the data owner.

As shown in FIG. 7, the model training device 700 includes a vector representation unit 710, a discriminant model acquisition unit 720, a model prediction unit 730, a loss function determination unit 740, a gradient information determination unit 750, a model update unit 760, and a gradient information providing unit 770.

During model training, the vector representation unit 710, the discriminant model acquisition unit 720, the model prediction unit 730, the loss function determination unit 740, the gradient information determination unit 750, the model update unit 760, and the gradient information providing unit 770 operate cyclically until the cycle is satisfied. End condition. The loop ending condition may include, for example, that a predetermined number of loops is reached, and the variation of each model parameter of the discriminant model is not greater than a predetermined threshold; or the current total loss function is within a predetermined range. When the cycle process is not over, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the current model of the next cycle process.

Specifically, the vector representation unit 710 is configured to provide the current feature data subset to the current graph neural network sub-model to obtain the feature vector representation of each node of the current graph neural network sub-model. The operation of the vector representation unit 710 may refer to the operation of 403 described above with reference to FIG. 4.

The discriminant model obtaining unit 720 is configured to obtain the current discriminant model from the server. The operation of the discriminant model acquisition unit 720 may refer to the operation of 404 described above with reference to FIG. 4.

The model prediction unit 730 is configured to provide the feature vector representation of each node to the current discriminant model to obtain the current predicted label value of each node. The operation of the model prediction unit 730 may refer to the operation of 405 described above with reference to FIG. 4.

The loss function determining unit 740 is configured to determine the current loss function according to the current predicted label value of each node and the corresponding real label value. The operation of the loss function determining unit 740 may refer to the operation of 406 described above with reference to FIG. 4.

The gradient information determining unit 750 is configured to determine the gradient information of the current discriminant model based on the current loss function when the loop end condition is not satisfied. The operation of the gradient information determining unit 750 may refer to the operation of 407 described above with reference to FIG. 4.

The model updating unit 760 is configured to update the model parameters of the neural network sub-model of the current graph based on the current loss function when the loop end condition is not satisfied. The operation of the model update unit 760 may refer to the operation of 407 described above with reference to FIG. 4.

The gradient information providing unit 770 is configured to provide gradient information of the current discriminant model to the server, and the server uses the gradient information of the current discriminant model from each data owner to update the discriminant model at the server. The operation of the gradient information providing unit 770 may refer to the operation of 408 described above with reference to FIG. 4.

In an example of this specification, the gradient information providing unit 770 can provide the gradient information of the current discriminant model to the server in a safe aggregation manner.

In addition, optionally, the model training device 700 may further include a training sample subset acquisition unit (not shown). In each cycle operation, the training sample subset acquiring unit is configured to acquire the current training sample subset.

FIG. 8 shows a block diagram of an apparatus for cooperatively training a graph neural network model via a plurality of data owners (hereinafter referred to as a model training apparatus 800) according to an embodiment of the present specification. In this embodiment, the graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. Each data owner has the ability to perform leveling on the training sample set used for model training. A subset of training samples obtained by segmentation, where the subset of training samples includes a subset of feature data and true label values. The model training device 800 is located on the server side.

As shown in FIG. 8, the model training device 800 includes a discriminant model providing unit 810, a gradient information acquiring unit 820, and a model updating unit 830.

During model training, the discriminant model providing unit 810, the gradient information acquiring unit 820, and the model updating unit 830 operate in a loop until the loop end condition is satisfied. The loop ending condition may include, for example, that a predetermined number of loops is reached, and the variation of each model parameter of the discriminant model is not greater than a predetermined threshold; or the current total loss function is within a predetermined range. When the cycle process is not over, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the current model of the next cycle process.

Specifically, the discriminant model providing unit 810 is configured to provide the current discriminant model to each data owner for use by each data owner to predict the predicted label value of each node. The operation of the discriminant model providing unit 810 may refer to the operation of 404 described above with reference to FIG. 4.

The gradient information acquiring unit 820 is configured to acquire the corresponding gradient information of the current discriminant model from each data owner when the loop end condition is not met. The operation of the gradient information acquisition unit 820 may refer to the operation of 408 described above with reference to FIG. 4.

The discriminant model update unit 830 is configured to update the current discriminant model based on gradient information from each data owner. The operation of the discriminant model update unit 830 can refer to the operation of 409 described above with reference to FIG. 4.

Fig. 9 shows a block diagram of an apparatus for model prediction based on a graph neural network model (hereinafter referred to as a model prediction apparatus 900) according to an embodiment of the present specification. The model prediction device 900 is applied to the data owner.

As shown in FIG. 9, the model prediction device 900 includes a vector representation unit 910, a discriminant model acquisition unit 920, and a model prediction unit 930.

The vector representation unit 910 is configured to provide the data to be predicted to the graph neural network sub-model at the data owner to obtain the feature vector representation of each node of the graph neural network sub-model. The discriminant model obtaining unit 920 is configured to obtain the discriminant model from the server. The model prediction unit 930 is configured to provide the feature vector representation of each node to the discriminant model to obtain the predicted label value of each node, thereby completing the model prediction process.

As above, referring to FIGS. 1 to 9, the model training and prediction method, device, and system according to the embodiments of this specification are described. The above model training device and model prediction device can be implemented by hardware, or by software or a combination of hardware and software.

FIG. 10 shows a hardware structure diagram of an electronic device 1000 for implementing a neural network model through a training graph of multiple data owners according to an embodiment of the present specification. As shown in FIG. 10, the electronic device 1000 may include at least one processor 1010, a memory (for example, a non-volatile memory) 1020, a memory 1030, and a communication interface 1040, and at least one processor 1010, a memory 1020, a memory 1030, and a communication interface 1040. The interfaces 1040 are connected together via a bus 1060. At least one processor 1010 executes at least one computer-readable instruction (ie, the above-mentioned element implemented in the form of software) stored or encoded in the memory.

In one embodiment, computer-executable instructions are stored in the memory, which when executed, cause at least one processor 1010: to execute the following loop process until the loop end condition is met: provide the current feature data subset to the data owner The current graph neural network sub-model to obtain the feature vector representation of each node of the current graph neural network sub-model; obtain the current discriminant model from the server; provide the feature vector representation of each node to the current discriminant model to obtain the feature vector representation of each node Current predicted label value; determine the current loss function according to the current predicted label value of each node and the corresponding real label value; when the loop end condition is not met, based on the current loss function, determine the gradient information of the current discriminant model through backpropagation And update the model parameters of the current graph neural network sub-model; and provide the gradient information of the current discriminant model to the server, which uses the gradient information from the current discriminant model of each data owner to update the discrimination at the server Model, where the updated graph neural network sub-model of each data owner and the discriminant model at the server end are used as the current model of the next cycle process when the loop end condition is not met.

It should be understood that the computer-executable instructions stored in the memory, when executed, cause at least one processor 1010 to perform the various operations and functions described above in conjunction with FIGS. 1-9 in the various embodiments of this specification.

FIG. 11 shows a hardware structure diagram of an electronic device 1100 for training a graph neural network model via multiple data owners according to an embodiment of the present specification. As shown in FIG. 11, the electronic device 1100 may include at least one processor 1110, a memory (for example, a non-volatile memory) 1120, a memory 1130, and a communication interface 1140, and at least one processor 1110, a memory 1120, a memory 1130, and a communication interface 1140. The interfaces 1140 are connected together via a bus 1160. At least one processor 1110 executes at least one computer-readable instruction (ie, the above-mentioned element implemented in the form of software) stored or encoded in the memory.

In one embodiment, computer-executable instructions are stored in the memory, which, when executed, cause at least one processor 1110 to execute the following loop process until the loop end condition is met: provide the current discriminant model to each data owner, each The data owner provides the feature vector representation of each node of the current subgraph neural network model to the current discriminant model to obtain the predicted label value of each node, and determines the respective current loss function based on the predicted label value of each node and the corresponding true label value , And when the loop end condition is not met, based on the respective current loss function, determine the gradient information of the discriminant model through back propagation and update the model parameters of the current graph neural network sub-model, and provide the determined gradient information to the server , The feature vector representation of each node is obtained by providing the current feature data subset to the current graph neural network sub-model; when the loop end condition is not met, the corresponding gradient information of the current discriminant model is obtained from each data owner, and based on The gradient information of each data owner updates the current discriminant model, where, when the loop end condition is not met, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the current discriminant model of the next cycle process. Model.

It should be understood that the computer-executable instructions stored in the memory, when executed, cause at least one processor 1110 to perform various operations and functions described above in conjunction with FIGS. 1-9 in the various embodiments of this specification.

FIG. 12 shows a hardware structure diagram of an electronic device 1200 for model prediction based on a graph neural network model according to an embodiment of the present specification. As shown in FIG. 12, the electronic device 1200 may include at least one processor 1210, a memory (for example, a non-volatile memory) 1220, a memory 1230, and a communication interface 1240, and at least one processor 1210, a memory 1220, a memory 1230, and a communication interface. The interfaces 1240 are connected together via a bus 1260. At least one processor 1210 executes at least one computer-readable instruction (i.e., the above-mentioned element implemented in the form of software) stored or encoded in the memory.

In one embodiment, computer-executable instructions are stored in the memory, which, when executed, cause at least one processor 1210 to: provide the data to be predicted to the graph neural network sub-model at the data owner to obtain the graph neural network The feature vector representation of each node of the sub-model; obtain the discriminant model from the server; and provide the feature vector representation of each node to the discriminant model to obtain the predicted label value of each node.

It should be understood that the computer-executable instructions stored in the memory, when executed, cause at least one processor 1210 to perform the various operations and functions described above in conjunction with FIGS. 1-9 in the various embodiments of this specification.

According to one embodiment, a program product such as a machine-readable medium (for example, a non-transitory machine-readable medium) is provided. The machine-readable medium may have instructions (ie, the above-mentioned elements implemented in the form of software), which, when executed by a machine, cause the machine to perform the various operations and functions described above in conjunction with FIGS. 1-9 in the various embodiments of this specification. . Specifically, a system or device equipped with a readable storage medium may be provided, and the software program code for realizing the function of any one of the above-mentioned embodiments is stored on the readable storage medium, and the computer or device of the system or device The processor reads and executes the instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the function of any one of the above embodiments, so the machine readable code and the readable storage medium storing the machine readable code constitute the present invention a part of.

Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, Volatile memory card and ROM. Alternatively, the program code can be downloaded from the server computer or the cloud via the communication network.

Those skilled in the art should understand that the various embodiments disclosed above can be modified and modified without departing from the essence of the invention. Therefore, the protection scope of the present invention should be defined by the appended claims.

It should be noted that not all steps and units in the above processes and system structure diagrams are necessary, and some steps or units can be omitted according to actual needs. The order of execution of each step is not fixed and can be determined as needed. The device structure described in the foregoing embodiments may be a physical structure or a logical structure, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented by multiple physical entities. Some components in independent devices are implemented together.

In the above embodiments, the hardware unit or module can be implemented mechanically or electrically. For example, a hardware unit, module, or processor may include a permanent dedicated circuit or logic (such as a dedicated processor, FPGA or ASIC) to complete the corresponding operation. The hardware unit or processor may also include programmable logic or circuits (such as general-purpose processors or other programmable processors), which may be temporarily set by software to complete corresponding operations. The specific implementation (mechanical, or dedicated permanent circuit, or temporarily set circuit) can be determined based on cost and time considerations.

The specific implementations set forth above in conjunction with the drawings describe exemplary embodiments, but do not represent all embodiments that can be implemented or fall within the protection scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration", and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, these techniques can be implemented without these specific details. In some instances, in order to avoid incomprehensibility to the concepts of the described embodiments, well-known structures and devices are shown in the form of block diagrams.

The foregoing description of the present disclosure is provided to enable any person of ordinary skill in the art to implement or use the present disclosure. For those of ordinary skill in the art, various modifications to the present disclosure are obvious, and the general principles defined herein can also be applied to other modifications without departing from the scope of protection of the present disclosure. . Therefore, the present disclosure is not limited to the examples and designs described herein, but is consistent with the widest scope that conforms to the principles and novel features disclosed herein.

Claims

A method for training a graph neural network model via multiple data owners. The graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. Each data owner There is a training sample subset obtained by horizontally splitting a training sample set used for model training, the training sample subset including a feature data subset and a true label value, the method is executed by the data owner, the Methods include:

Perform the following loop process until the loop end condition is met:

Providing the current feature data subset to the current graph neural network sub-model at the data owner to obtain the feature vector representation of each node of the current graph neural network sub-model;

Obtain the current discriminant model from the server;

Providing the feature vector representation of each node to the current discriminant model to obtain the current predicted label value of each node;

Determine the current loss function according to the current predicted label value of each node and the corresponding real label value;

When the loop end condition is not met,

Based on the current loss function, determine the gradient information of the current discriminant model and update the model parameters of the neural network sub-model of the current graph; and

The gradient information of the current discriminant model is provided to the server, and the server uses the gradient information of the current discriminant model from each data owner to update the discriminant model at the server, where When the loop ending condition is not met, the updated graph neural network sub-model of each data owner and the discriminant model at the server end are used as the current model of the next loop process.
The method of claim 1, wherein the gradient information obtained from each data owner is provided to the server in a secure aggregation manner.
The method of claim 2, wherein the secure aggregation includes:

Secure aggregation based on secret sharing;

Secure aggregation based on homomorphic encryption; or

Secure aggregation based on trusted execution environment.
The method of claim 1, wherein, during each cycle, the method further comprises:

Get the current training sample subset.
The method according to any one of claims 1 to 4, wherein the loop end condition comprises:

Predetermined number of cycles;

The variation of each model parameter of the discriminant model is not greater than a predetermined threshold; or

The current total loss function is within a predetermined range.
The method according to any one of claims 1 to 4, wherein the characteristic data includes characteristic data based on image data, voice data, or text data, or the characteristic data includes user characteristic data.
A method for training a graph neural network model via multiple data owners. The graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. Each data owner There is a training sample subset obtained by horizontally splitting a training sample set used for model training, the training sample subset including a feature data subset and a true label value, the method is executed by the server, the method include:

Perform the following loop process until the loop end condition is met:

Provide the current discriminant model to each data owner, and each data owner provides the feature vector representation of each node of the current subgraph neural network model to the current discriminant model to obtain the predicted label value of each node, based on the prediction of each node The label value and the corresponding real label value determine their current loss function, and when the loop end condition is not met, each data owner determines the gradient information of the discriminant model and updates the current graph neural network sub-model based on their current loss function Model parameters, and provide the determined gradient information to the server, and the feature vector representation of each node is obtained by providing a current feature data subset to the current graph neural network sub-model;

When the loop end condition is not met, obtain the corresponding gradient information of the current discriminant model from each data owner, and update the current discriminant model based on the gradient information from each data owner,

Wherein, when the loop ending condition is not met, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the current model of the next loop process.
The method according to claim 7, wherein the gradient information obtained from each data owner is provided to the server in a secure aggregation manner.
The method of claim 8, wherein the secure aggregation comprises:

Secure aggregation based on secret sharing;

Secure aggregation based on homomorphic encryption; or

Secure aggregation based on trusted execution environment.
A method for making model predictions using a graph neural network model. The graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. The method is executed by the data owner , The method includes:

Providing the feature data to be predicted to the graph neural network sub-model at the data owner to obtain the feature vector representation of each node of the graph neural network sub-model;

Obtain the discriminant model from the server; and

The feature vector representation of each node is provided to the discriminant model to obtain the predicted label value of each node.
A device for training a graph neural network model via multiple data owners. The graph neural network model includes a discriminant model on the server side and a graph neural network sub-model located at each data owner. Each data owner There is a training sample subset obtained by horizontally splitting a training sample set used for model training, the training sample subset includes a feature data subset and a true label value, the device is applied to the data owner, the The device includes:

The vector representation unit provides the current feature data subset to the current graph neural network sub-model to obtain the feature vector representation of each node of the current graph neural network sub-model;

The discriminant model acquisition unit, which obtains the current discriminant model from the server;

The model prediction unit provides the feature vector representation of each node to the current discriminant model to obtain the current predicted label value of each node;

The loss function determining unit determines the current loss function according to the current predicted label value of each node and the corresponding real label value;

A gradient information determining unit, when the loop end condition is not met, determine the gradient information of the current discriminant model based on the current loss function;

The model update unit updates the model parameters of the neural network sub-model of the current graph based on the current loss function when the loop end condition is not met; and

The gradient information providing unit provides the gradient information of the current discriminant model to the server, and the server uses the gradient information of the current discriminant model from each data owner to update the discriminant model at the server ,

Wherein, the vector representation unit, the discriminant model acquisition unit, the model prediction unit, the loss function determination unit, the gradient information determination unit, the model update unit, and the gradient information providing unit cyclically operate, Until the loop end condition is met, when the loop end condition is not met, the updated graph neural network sub-model of each data owner and the discriminant model of the server are used as the current model of the next loop process.
The apparatus of claim 11, wherein the gradient information providing unit uses a secure aggregation method to provide the gradient information obtained from the data owner to the server.
The apparatus of claim 12, wherein the secure aggregation comprises:

Secure aggregation based on secret sharing;

Secure aggregation based on homomorphic encryption; or

Secure aggregation based on trusted execution environment.
The apparatus of claim 11, further comprising:

The training sample subset acquisition unit acquires the current training sample subset during each cycle operation.
A device for training a graph neural network model via multiple data owners. The graph neural network model includes a discriminant model on the server side and a graph neural network sub-model located at each data owner. Each data owner There is a training sample subset obtained by horizontally splitting a training sample set used for model training, the training sample subset including a feature data subset and a true label value, the device is applied to the server, the device include:

The discriminant model providing unit provides the current discriminant model to each data owner, and each data owner provides the feature vector representation of each node of the current graph neural network submodel to the current discriminant model to obtain the predicted label value of each node, Determine the respective current loss function based on the predicted label value of each node and the corresponding real label value, and when the loop end condition is not satisfied, each data owner determines the gradient information and update the discriminant model based on their current loss function The model parameters of the current graph neural network sub-model, and the determined gradient information is provided to the server. The feature vector of each node indicates that the current feature data subset is provided to the current graph neural network sub-model. get;

The gradient information acquiring unit, when the loop end condition is not met, acquires the corresponding gradient information of the current discriminant model from each data owner; and

The discriminant model update unit updates the current discriminant model based on gradient information from each data owner,

Wherein, the discriminant model providing unit, the gradient information acquiring unit, and the discriminant model updating unit operate in a loop until the loop end condition is satisfied. When the loop end condition is not satisfied, the updated data The graph neural network sub-model of the owner and the discriminant model of the server are used as the current model of the next cycle process.
A system for training graph neural network models through multiple data owners, including:

A plurality of data owner devices, each data owner device comprising the device according to any one of claims 11 to 14; and

Server equipment, including the device as claimed in claim 15,

Wherein, the graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. Each data owner has the ability to obtain by horizontally splitting the training sample set used for model training. A subset of training samples of, the subset of training samples includes a subset of feature data and true label values.
A device for making model predictions using a graph neural network model. The graph neural network model includes a discriminant model located on the server side and a graph neural network sub-model located at each data owner. The device is applied to the data owner , The device includes:

The vector representation unit provides the data to be predicted to the graph neural network sub-model at the data owner to obtain the feature vector representation of each node of the graph neural network sub-model;

The discriminant model acquisition unit obtains the discriminant model from the server; and

The model prediction unit provides the feature vector representation of each node to the discriminant model to obtain the predicted label value of each node.
An electronic device including:

At least one processor, and

A memory coupled with the at least one processor, the memory stores instructions, and when the instructions are executed by the at least one processor, the at least one processor is caused to execute any one of claims 1 to 6 Methods.
A machine-readable storage medium storing executable instructions, which when executed, cause the machine to execute the method according to any one of claims 1 to 6.
An electronic device including:

At least one processor, and

A memory coupled to the at least one processor, the memory stores instructions, and when the instructions are executed by the at least one processor, the at least one processor is caused to execute any one of claims 7 to 9 Methods.
A machine-readable storage medium storing executable instructions, which when executed, cause the machine to execute the method according to any one of claims 7 to 9.
An electronic device including:

At least one processor, and

A memory coupled with the at least one processor, the memory stores instructions, and when the instructions are executed by the at least one processor, the at least one processor executes the method according to claim 10.
A machine-readable storage medium storing executable instructions, which when executed, cause the machine to execute the method according to claim 10.