CN112149808A

CN112149808A - Method, system and medium for expanding stand-alone graph neural network training to distributed training

Info

Publication number: CN112149808A
Application number: CN202011043369.2A
Authority: CN
Inventors: 陈榕; 杨健邦; 陈海波; 臧斌宇
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2020-12-29
Anticipated expiration: 2040-09-28
Also published as: CN112149808B

Abstract

The invention discloses a method for expanding stand-alone image neural network training to distributed training, which utilizes a stand-alone image neural network framework with automatic back propagation by providing the functions of image division and data synchronous expansion, only needs to add a small amount of data synchronous codes in the original stand-alone image neural network model codes, divides a large image into a plurality of servers, and can realize distributed large image training equivalent to stand-alone image neural network training by using the plurality of servers under the conditions of not modifying the stand-alone image neural network framework and not modifying the calculation logic of the original stand-alone image neural network model. In addition, the invention also discloses a system for expanding the training of the single-diagram neural network to the distributed training. In addition, the present invention discloses a computer-readable storage medium storing a computer program.

Description

Method, system and medium for expanding stand-alone graph neural network training to distributed training

Technical Field

The invention relates to the field of deep learning and the field of graph neural networks, in particular to a method for expanding training of a single graph neural network to distributed training.

Background

The graph structure data can represent the relevance among data and can be used for describing a plurality of problems in real life. The deep learning method based on the graph, such as the graph neural network (GCN, GAT, GraphSage, etc.), can be used for predicting the types of nodes on the graph, predicting the possibility of edges between the nodes, and the like, and has a very good effect in many fields.

The stand-alone neural network framework (e.g., DGL and PyG) provides a flexible and convenient programming interface and possesses good stand-alone training performance. In practical production applications, however, the scale of the drawings is already huge, and both points and edges reach hundreds of millions and even more than billions. Typically, a server is not sufficient to store and compute such large-scale data, and therefore, a single-machine neural network cannot train such large-scale data.

At present, most methods for realizing large-scale graph neural network training adopt a mode of sampling first and then training, a sub-graph which can be stored and calculated by a single server is obtained by sampling a large graph, and then training is carried out. Although the problems of insufficient computing and storage resources of large-graph training and low efficiency of single-machine large-graph training can be well solved, the problems that the final accuracy of model training may be insufficient and the parameter convergence period is increased still exist.

In addition, a method combining a deep learning framework and distributed graph calculation is adopted to realize distributed large graph training for large-scale graph data. This approach can also be used to train large-scale graph data using the same computational logic as a single server, thereby ensuring the accuracy of model training and the convergence speed of parameters, but it requires the user to write forward-propagating and backward-propagating computation logic of graph computation operations simultaneously, however, most of the deep learning frameworks and single-machine neural network frameworks adopt an automatic back propagation method, do not need a user to write back propagation steps, because the counter-propagating computation logic is relatively complex compared to the forward-propagating computation logic, in a distributed scenario, the logic becomes more complex, the back propagation logic implemented by the user is prone to error, the user is difficult to verify the correctness of the back propagation logic, therefore, the scheme lacks flexibility and cannot conveniently and rapidly develop and realize the neural network model of the graph.

Therefore, a scheme is desired, which can implement distributed large-scale graph training of large-scale graph data, provide a flexible and convenient interface like a single-graph neural network framework, and simultaneously achieve sufficiently high training accuracy and guarantee efficient training performance.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for a distributed training graph neural network of a large graph, which can utilize a plurality of servers for cooperative calculation and fully utilize a single graph neural network framework, respectively execute a training process of the single graph neural network on each server under the conditions of not modifying the single graph neural network framework and hardly modifying a single graph neural network model code, and perform proper data synchronization in the calculation process of single graph training through the expansion of a data synchronization operator to cooperate with the plurality of servers to realize distributed large graph training, thereby solving the problem of large-scale graph data training.

The purpose of the invention can be realized by the following technical scheme:

in a first aspect, the present invention provides a method for expanding a stand-alone graph neural network training to a distributed training, which includes the following steps:

the method comprises the following steps: registering data synchronization operation into an operator of a single-machine graph neural network framework;

step two: modifying the code of the single-diagram neural network model, and adding the calling code of the data synchronization operator defined in the step one before traversing the calculation operators of all the diagrams in the single-diagram neural network model;

step three: dividing the graph so that each server obtains a part of nodes and corresponding edges of the whole graph;

step four: initializing model parameters;

step five: single machine model logic forward propagation: the single-machine graph neural network framework on each server carries out calculation logic as a single machine according to seen sub-graphs, and when a data synchronization operator is met, the step six is executed; otherwise, executing the seventh step until the forward propagation is finished;

step six: forward propagation logic of the data synchronization operator: each server synchronizes the latest values of the nodes from other servers and returns to the step five;

step seven: performing loss calculation on nodes on the single-machine graph neural network on each server after the single-machine model logic training is completed;

step eight: single machine model logical automatic gradient backpropagation: the single-machine neural network framework on each server carries out the same back propagation gradient solving logic as a single machine, the back propagation is automatically executed by the single-machine neural network framework, and when the back propagation of the data synchronization operator is executed, the single-machine neural network framework automatically calls the logic of the execution step nine; otherwise, executing step ten until the back propagation is finished;

step nine: automatic gradient synchronization logic in data synchronization operator backpropagation: sending the gradient of the node on each server to the server to which the node belongs, and summing the local gradient of each node and the gradients sent by other servers by each server to serve as the gradient of the node;

step ten: and (5) synchronizing the parameter gradients and updating the parameters by the stand-alone image neural network frameworks on all the servers, returning to the step five to perform iteration of the next round of training until the model parameters are converged, and finishing the training.

It should be noted that, in the technical solution of the present invention, the method for expanding the training of the single-diagram neural network to the distributed training is applicable to various single-diagram neural network frameworks, and the application range is wide. The single-machine graph neural network framework refers to various generalized frameworks, systems and libraries which can complete the training of the complete graph neural network model by using a single server, and is not limited to frameworks, systems and libraries which are specially designed for the training of the graph neural network model, for example, the graph neural network frameworks which are mainly designed for the training of the single-machine graph neural network, such as DGL, PyG, PGL and the like, are all applicable to the technical scheme of the invention; although deep learning frameworks (libraries) such as TensorFlow, PyTorch, MXNet, Paddle and the like are not specially designed for single-machine neural network training, a small number of sparse matrix operators are provided, a small number of graph neural network models can be realized, and the deep learning frameworks (libraries) are also suitable for the technical scheme provided by the invention.

In the technical solution of the present invention, it relates to the development and improvement of graph partitioning and data synchronization of a stand-alone graph neural network trained in a multi-server environment, and does not relate to the improvement of training computational logic of the stand-alone graph neural network itself, so it can be understood that the method for developing stand-alone graph neural network training to distributed training is applicable to various stand-alone graph neural networks, and technicians in the field can perform distributed training on any stand-alone graph neural network according to the specific situation of each implementation mode.

It should be further noted that the technical solution of the present invention can be integrated into a stand-alone diagram neural network framework, and performs distributed expansion by becoming a built-in operator of the stand-alone diagram neural network framework, and can also perform distributed expansion on the stand-alone diagram neural network framework by using a custom operator provided by the stand-alone diagram neural network framework on the basis of not modifying the stand-alone diagram neural network framework.

It should be further noted that the technical solution of the present invention provides an automatic back-propagation solution for data synchronization expansion for a stand-alone neural network framework with an automatic back-propagation function, but this does not require that the stand-alone neural network framework must have an automatic back-propagation function; the data synchronization logic and method of the present invention are also applicable to a stand-alone neural network framework without automatic back-propagation, such as: the Eigen, Numpy and other matrix operation libraries do not have the function of automatic back propagation, and a user needs to manually write corresponding back propagation logic of each forward calculation step of the graph neural network model, but the same data synchronization logic as the scheme can also be adopted and realized.

Preferably, in the second step, all data synchronization operations are performed by calling a data synchronization operator, and a row of codes calling the data synchronization operator is added before all codes in the original computation logic for traversing the computation operator with respect to the graph.

Preferably, in the third step, the graph partitioning partitions each node and all edges in a certain direction corresponding to the node onto one of the servers, and the nodes are called dedicated nodes; after the division is finished, each server also includes nodes which are connected with the dedicated nodes to which the server belongs in an edge mode but do not belong into the local node set of the server, the nodes are called external nodes, and the dedicated nodes and the external nodes jointly form the local node set of the server.

Preferably, in the fourth step, each server copies a complete graph neural network model, the parameters are initialized by the stand-alone graph neural network framework of one of the servers, and the parameters on all the servers are synchronized to the parameter values of the one of the servers, so that the parameter values on all the servers are equal.

Preferably, in the fifth step, the stand-alone graph neural network framework on each server takes a graph formed by the dedicated node and the relevant edge of the server and the external node of the server to which the stand-alone graph neural network framework belongs as the graph to be calculated by the server.

Preferably, in the sixth step, each server synchronizes, at each external node local to the server, the latest value of the external node from the server to which the external node belongs. .

Preferably, in the seventh step, each server selects a part of nodes in its dedicated node set for loss calculation, so that the loss of the same node is not calculated repeatedly by different servers.

Preferably, in the ninth step, each server sends the gradient on each local external node back to the server to which the external node belongs, and then sets the gradients of the local external nodes to 0; after each server receives the gradient of the exclusive node sent by other servers, the gradient received by each exclusive node is added with the gradient obtained by the server through single-machine logic calculation to obtain the final gradient of each exclusive node.

In a second aspect, the present invention further provides a system for expanding a stand-alone neural network training to a distributed training, where the system includes: the system for training the extended single-computer neural network to the distributed training executes the method for training the extended single-computer neural network to the distributed training.

In a third aspect, the present invention further provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the above method for extending a stand-alone neural network to distributed training.

Compared with the prior art, the invention has the following advantages:

1. the storage resources and the computing resources of a plurality of servers are used for collaborative training, and the problem that the computing resources and the storage resources of a single server are insufficient is solved.

2. The distributed training of large-scale graph data can be realized, the calculation logic same as that of a single server is possessed, and the training accuracy and the parameter convergence speed are guaranteed.

3. The existing pattern neural network model code can be fully utilized, the model code is hardly required to be modified, the learning cost of a user is low, and the model code is easy to use and use.

4. The method has universality, is suitable for various single-diagram neural network frameworks, fully utilizes the existing single-diagram neural network framework, and supports the expansion of most single-diagram neural network models into distributed-diagram neural network models.

5. The method of the invention has flexibility, and the user can implement automatic backward propagation only by compiling forward propagation without compiling the backward propagation process.

Drawings

FIG. 1 is a flow chart of a method of expanding a single-machine neural network training to a distributed training of the present invention;

fig. 2 is a schematic diagram of an embodiment scenario of an embodiment of the present description.

Detailed Description

The invention will be further explained with reference to the drawings.

As shown in FIG. 1, a specific process for expanding the training of the single-graph neural network to the distributed training method is provided. The following detailed description is made in conjunction with the embodiment disclosed in fig. 2:

in fig. 2, there are 5 nodes, A, B, C, D and E respectively, and the connection relationship of the edges between the nodes is shown, each node contains a multidimensional initial feature tensor and possibly a label. In the embodiment, a total of two servers are used for training.

In step one, the data synchronization operation is registered as an operator of the single-graph neural network framework.

In the second step, modifying the code of the stand-alone graph neural network model, and adding a row of codes for calling the data synchronization operator defined in the first step before the codes for the graph traversal calculation operator in the definition code of the original stand-alone graph neural network model.

In step three, each server copies a complete graph neural network model, initializes all parameters of the model by the stand-alone graph neural network framework of the server 1, and synchronizes the parameters on all servers to the parameter values on the server 1, so that the parameter values on all servers are equal.

In the fourth step, graph division is carried out, wherein a node and the incoming edge of the node are selected to be divided into the same machines, nodes A and D and the incoming edges of the nodes are obtained by division on the server 1, the nodes A and the incoming edges are connected with the edges of the nodes C, the nodes C are added into a local node set of the server 1, the nodes A and the D are exclusive nodes of the server 1, and the nodes C are external nodes of the server 1; the server 2 is divided into nodes B, C and E and their incoming edges, since B and A are connected by edges, A will be the node set of the server 2, B, C and E are the exclusive nodes of the server 2, and A is the external node of the server 2.

In step five, forward propagation of the single machine model logic: the standalone graph neural network framework on each server performs the same computational logic as a standalone for the sub-graph it sees, the sub-graph seen by server 1 contains the incoming edges of nodes A, D, C and A, D, and the sub-graph seen by server 2 contains the incoming edges of nodes B, C, E, A and B, C, E. When the calculation logic of the single computer is executed to the data synchronization operator, executing a step six; otherwise, until the forward propagation is finished, executing step seven.

In step six, the forward propagation logic of the data synchronization operator: the server 1 synchronizes the latest data of the external node C from the server 2 to cover the old local value; server 2 synchronizes the latest data of node a from server 1, overwriting the old values of the locality. And returning to the step five.

In step seven, each server selects a part of the dedicated nodes to perform loss calculation and summation, the server 1 selects to perform loss calculation on the dedicated node a, and the server 2 selects to perform loss calculation on the dedicated nodes B and E.

In step eight, the automatic gradient backpropagation of the single machine model logic: the single-machine neural network framework on each server carries out the same back propagation gradient solving logic as a single machine, the back propagation is automatically executed by the single-machine neural network framework, and when the back propagation of the data synchronization operator is executed, the single-machine neural network framework automatically calls the logic of the execution step nine; otherwise, until the back propagation is finished, step ten is executed.

In step nine, the data synchronization operator counter-propagates the automatic gradient synchronization logic: the server 1 sends the gradient of the external node C to the server 2, the local gradient of the node C is set to be 0, and after the server 2 receives the gradient of the node C, the received gradient and the local gradient are added to be used as the gradient of the node C; the server 2 sends the gradient of the external node a to the server 1, sets the local gradient of the node a to 0, and after receiving the gradient of the node a, the server 1 adds the received gradient and the local gradient to obtain the gradient of the node a.

In the tenth step, the stand-alone image neural network frameworks on all the servers perform parameter gradient synchronization, update parameters, return to the fifth step, perform iteration of the next round of training until the model parameters are converged, and finish the training.

It will be appreciated by those skilled in the art that, in addition to implementing the system and its various devices, modules, units provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to enable the system and its various devices, modules, units provided by the present invention to perform the same functions in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

It should be noted that the prior art in the protection scope of the present invention is not limited to the examples given in the present application, and all the prior art which is not inconsistent with the technical scheme of the present invention, including but not limited to the prior patent documents, the prior publications and the like, can be included in the protection scope of the present invention.

In addition, the combination of the features in the present application is not limited to the combination described in the claims of the present application or the combination described in the embodiments, and all the features described in the present application may be freely combined or combined in any manner unless contradictory to each other.

It should also be noted that the above-mentioned embodiments are only specific embodiments of the present invention. It is apparent that the present invention is not limited to the above embodiments and similar changes or modifications can be easily made by those skilled in the art from the disclosure of the present invention and shall fall within the scope of the present invention.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for expanding training of a stand-alone graph neural network to distributed training is characterized by comprising the following steps:

step four: initializing model parameters;

2. The method of claim 1, wherein in step two, all data synchronization operations are performed by calling a data synchronization operator, and a row of codes for calling the data synchronization operator is added before all codes for traversing the computation operator with respect to the graph in the original computation logic.

3. The method for expanding training of the stand-alone graph neural network to the distributed training according to claim 1, wherein in the third step, the graph partitioning partitions each node and all edges of a certain direction corresponding to the node onto one of the servers, and the nodes are called dedicated nodes; after the division is finished, each server also includes nodes which are connected with the dedicated nodes to which the server belongs in an edge mode but do not belong into the local node set of the server, the nodes are called external nodes, and the dedicated nodes and the external nodes jointly form the local node set of the server.

4. The method according to claim 1, wherein in the fourth step, each server copies a complete graph neural network model, initializes the parameters of the model by the single graph neural network framework of one of the servers, and synchronizes the parameters of all the servers to the values of the parameters of the one of the servers, so that the values of the parameters of all the servers are equal.

5. The method as claimed in claim 1, wherein in the fifth step, the stand-alone neural network framework on each server uses a graph formed by dedicated nodes and relevant edges of the server and external nodes of the server as the graph to be calculated by the server.

6. The method according to claim 1, wherein in step six, each server synchronizes, at each external node local to the server, the latest value of the external node from the server to which the external node belongs.

7. The method as claimed in claim 1, wherein in step seven, each server selects a part of nodes in its dedicated node set for loss calculation, so that the loss of the same node is not calculated repeatedly by different servers.

8. The method according to claim 1, wherein in the ninth step, each server sends the gradient at each local external node back to the server to which the external node belongs, and sets the gradient at the local external node to 0; after each server receives the gradient of the exclusive node sent by other servers, the gradient received by each exclusive node is added with the gradient obtained by the server through single-machine logic calculation to obtain the final gradient of each exclusive node.

9. A system for extending a stand-alone neural network training to a distributed training, the system comprising: a plurality of computers and data, each computer having one or more computing nodes, the system for extended standalone graph neural network training to distributed training performing the method for extended standalone graph neural network training to distributed training of any of claims 1-8.

10. A computer-readable storage medium storing a computer program, wherein the computer program when executed is configured to implement the method of extending the training of a standalone graph neural network to distributed training of any of claims 1-8.