CN112149808B

CN112149808B - Method, system and medium for expanding stand-alone graph neural network training to distributed training

Info

Publication number: CN112149808B
Application number: CN202011043369.2A
Authority: CN
Inventors: 陈榕; 杨健邦; 陈海波; 臧斌宇
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-10-14
Anticipated expiration: 2040-09-28
Also published as: CN112149808A

Abstract

The invention discloses a method for expanding stand-alone image neural network training to distributed training, which utilizes a stand-alone image neural network framework with automatic back propagation by providing the functions of image division and data synchronous expansion, only needs to add a small amount of data synchronous codes in the original stand-alone image neural network model codes, divides a large image into a plurality of servers, and can realize distributed large image training equivalent to stand-alone image neural network training by using the plurality of servers under the conditions of not modifying the stand-alone image neural network framework and not modifying the calculation logic of the original stand-alone image neural network model. In addition, the invention also discloses a system for expanding the training of the single-diagram neural network to the distributed training. In addition, the present invention discloses a computer-readable storage medium storing a computer program.

Description

Method, system and medium for expanding stand-alone diagram neural network training to distributed training

Technical Field

The invention relates to the field of deep learning and the field of graph neural networks, in particular to a method for expanding training of a single graph neural network to distributed training.

Background

The graph structure data can represent the relevance among data and can be used for describing a plurality of problems in real life. The deep learning method based on the graph, such as the graph neural network (GCN, GAT, graphSage, etc.), can be used for predicting the types of nodes on the graph, predicting the possibility of edges between the nodes, and the like, and has a very good effect in many fields.

The stand-alone neural network framework (e.g., DGL and PyG) provides a flexible and convenient programming interface and possesses good stand-alone training performance. In practical production applications, however, the scale of the drawings is already huge, and both points and edges reach hundreds of millions and even more than billions. Typically, a server is not sufficient to store and compute such large-scale data, and therefore, a stand-alone graph neural network cannot train such large-scale graph data.

At present, most methods for realizing large-scale graph neural network training adopt a mode of sampling first and then training, a sub-graph which can be stored and calculated by a single server is obtained by sampling a large graph, and then training is carried out. Although the problems of insufficient computing and storage resources of large-graph training and low efficiency of single-machine large-graph training can be well solved, the problems that the final accuracy of model training may be insufficient and the parameter convergence period is increased still exist.

In addition, a method combining a deep learning framework and distributed graph calculation is adopted to realize distributed large graph training for large-scale graph data. Although the method can also train large-scale graph data by using the same computational logic as a single server so as to ensure the accuracy of model training and the convergence speed of parameters, the method requires a user to write forward propagation and backward propagation computational logic of graph computation operation at the same time, most of deep learning frameworks and single-graph neural network frameworks adopt an automatic backward propagation method at present, and the user does not need to write backward propagation steps, because the backward propagation computational logic is more complex compared with the forward propagation computational logic and becomes more complex in a distributed scene, the backward propagation logic realized by the user is easy to make mistakes, and the user is difficult to verify the correctness of the backward propagation logic, so the scheme lacks flexibility and cannot conveniently and quickly develop and realize the graph neural network model.

Therefore, a scheme is desired, which not only can implement distributed large-graph training of large-scale graph data, but also can provide a flexible and convenient interface like a stand-alone graph neural network framework, and at the same time, can achieve a sufficiently high training accuracy and ensure efficient training performance.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for a distributed training graph neural network of a large graph, which can utilize a plurality of servers for cooperative calculation and fully utilize a single graph neural network framework, respectively execute a training process of the single graph neural network on each server under the conditions of not modifying the single graph neural network framework and hardly modifying a single graph neural network model code, and perform proper data synchronization in the calculation process of single graph training through the expansion of a data synchronization operator to cooperate with the plurality of servers to realize distributed large graph training, thereby solving the problem of large-scale graph data training.

The purpose of the invention can be realized by the following technical scheme:

in a first aspect, the present invention provides a method for expanding a stand-alone graph neural network training to a distributed training, which includes the following steps:

the method comprises the following steps: registering data synchronization operation into an operator of a single-machine diagram neural network framework;

step two: modifying the code of the stand-alone diagram neural network model, and adding a calling code of the data synchronization operator defined in the step one before traversing calculation operators of all diagrams in the stand-alone diagram neural network model;

step three: dividing the graph so that each server obtains a part of nodes and corresponding edges of the whole graph;

step four: initializing model parameters;

step five: single machine model logic forward propagation: the single-machine graph neural network framework on each server carries out calculation logic as a single machine according to seen sub-graphs, and when a data synchronization operator is met, the step six is executed; otherwise, executing the seventh step until the forward propagation is finished;

step six: forward propagation logic of the data synchronization operator: each server synchronizes the latest values of the nodes from other servers and returns to the step five;

step seven: performing loss calculation on nodes on the single-machine graph neural network on each server after the single-machine model logic training is completed;

step eight: single machine model logical automatic gradient backpropagation: the single-machine neural network framework on each server carries out the same back propagation gradient solving logic as a single machine, the back propagation is automatically executed by the single-machine neural network framework, and when the back propagation of the data synchronization operator is executed, the single-machine neural network framework automatically calls the logic of the execution step nine; otherwise, executing step ten until the back propagation is finished;

step nine: automatic gradient synchronization logic in data synchronization operator backpropagation: sending the gradient of the node on each server to the server to which the node belongs, and summing the local gradient of each node and the gradients sent by other servers by each server to serve as the gradient of the node;

step ten: and (5) synchronizing the parameter gradients and updating the parameters by the stand-alone image neural network frameworks on all the servers, returning to the step five to perform iteration of the next round of training until the model parameters are converged, and finishing the training.

It should be noted that, in the technical solution of the present invention, the method for expanding the training of the single-diagram neural network to the distributed training is applicable to various single-diagram neural network frameworks, and the application range is wide. The stand-alone diagram neural network framework refers to various generalized frameworks, systems and libraries which can complete the training of the complete diagram neural network model by using a single server, and is not limited to the frameworks, systems and libraries specially designed for the training of the diagram neural network model, for example, the diagram neural network frameworks such as DGL, pyG and PGL which are mainly designed for the training of the stand-alone diagram neural network are all applicable to the technical scheme of the invention; although deep learning frameworks (libraries) such as TensorFlow, pyTorch, MXNet, paddle and the like are not specially designed for single-machine neural network training, a small number of sparse matrix operators are provided, a small number of graph neural network models can be realized, and the deep learning frameworks (libraries) are also suitable for the technical scheme provided by the invention.

In the technical solution of the present invention, it relates to the development and improvement of graph partitioning and data synchronization for the training of a single-graph neural network in a multi-server environment, and does not relate to the improvement of the training computation logic of the single-graph neural network itself, so it can be understood that the method for developing the training of the single-graph neural network to the distributed training is applicable to various types of single-graph neural networks, and a person skilled in the art can perform the distributed training on any single-graph neural network according to the specific situation of each implementation mode.

It should be further noted that the technical solution of the present invention can be integrated into a stand-alone diagram neural network framework, and performs distributed expansion by becoming a built-in operator of the stand-alone diagram neural network framework, and can also perform distributed expansion on the stand-alone diagram neural network framework by using a custom operator provided by the stand-alone diagram neural network framework on the basis of not modifying the stand-alone diagram neural network framework.

It should be further noted that, the technical solution of the present invention provides an automatic back propagation solution for data synchronization expansion for a stand-alone figure neural network framework with an automatic back propagation function, but this does not require that the stand-alone figure neural network framework must have an automatic back propagation function; the data synchronization logic and method of the present invention are also applicable to stand-alone neural network frameworks without automatic back-propagation, such as: the Eigen, numpy and other matrix operation libraries do not have the function of automatic back propagation, and a user needs to manually write corresponding back propagation logic of each forward calculation step of the graph neural network model, but the same data synchronization logic as the scheme can also be adopted and realized.

Preferably, in the second step, all data synchronization operations are performed by calling a data synchronization operator, and a row of codes for calling the data synchronization operator is added before all codes for traversing the computation operator with respect to the graph in the original computation logic.

Preferably, in the third step, the graph partitioning partitions each node and all edges in a certain direction corresponding to the node onto one of the servers, and the nodes are called dedicated nodes; after the division is finished, each server also includes nodes which are connected with the dedicated nodes to which the server belongs in an edge mode but do not belong into the local node set of the server, the nodes are called external nodes, and the dedicated nodes and the external nodes jointly form the local node set of the server.

Preferably, in the fourth step, each server copies a complete graph neural network model, the parameters are initialized by the stand-alone graph neural network framework of one of the servers, and the parameters on all the servers are synchronized to the parameter values of the one of the servers, so that the parameter values on all the servers are equal.

Preferably, in the fifth step, the standalone graph neural network framework on each server uses a graph composed of the dedicated node and the relevant edge of the server to which the standalone graph neural network framework belongs and the external node of the server as the graph to be calculated by the server.

Preferably, in the sixth step, each server synchronizes, at each external node local to the server, the latest value of the external node from the server to which the external node belongs. .

Preferably, in the seventh step, each server selects a part of nodes in its dedicated node set for loss calculation, so that the loss of the same node is not calculated repeatedly by different servers.

Preferably, in the step nine, each server sends the gradient on each local external node back to the server to which the external node belongs, and then sets the gradients of the local external nodes to 0; after each server receives the gradient of the exclusive node sent by other servers, the gradient received by each exclusive node is added with the gradient obtained by the server through single-machine logic calculation to obtain the final gradient of each exclusive node.

In a second aspect, the present invention further provides a system for expanding a stand-alone neural network training to a distributed training, where the system includes: the system for training the extended single-computer neural network to the distributed training executes the method for training the extended single-computer neural network to the distributed training.

In a third aspect, the present invention further provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the above method for extending a stand-alone neural network to distributed training.

Compared with the prior art, the invention has the following advantages:

1. the storage resources and the computing resources of a plurality of servers are used for collaborative training, and the problem that the computing resources and the storage resources of a single server are insufficient is solved.

2. The distributed training of large-scale graph data can be realized, the calculation logic same as that of a single server is possessed, and the training accuracy and the parameter convergence speed are guaranteed.

3. The existing graph neural network model code can be fully utilized, the model code is hardly required to be modified, the learning cost of a user is low, and the user can easily use the graph neural network model code.

4. The method has universality, is suitable for various single-machine diagram neural network frameworks, fully utilizes the conventional single-machine diagram neural network framework, and supports the expansion of most single-machine diagram neural network models into a distributed diagram neural network model.

5. The method of the invention has flexibility, and the user can implement automatic backward propagation only by compiling forward propagation without compiling the backward propagation process.

Drawings

FIG. 1 is a flow chart of a method of expanding a single-machine neural network training to a distributed training of the present invention;

fig. 2 is a schematic diagram of an embodiment scenario of an embodiment of the present description.

Detailed Description

The invention will be further explained with reference to the drawings.

As shown in FIG. 1, a specific process for expanding the training of the single-machine neural network to the distributed training method is provided. The following detailed description is made in conjunction with the embodiment disclosed in fig. 2:

in fig. 2, the original graph has 5 nodes, which are a, B, C, D, and E, respectively, and the connection relationship of edges between the nodes is shown in the figure, where each node includes a multidimensional initial feature tensor and may include a label. In the embodiment, a total of two servers are used for training.

In step one, the data synchronization operations are registered as operators of the single-graph neural network framework.

In the second step, modifying the code of the stand-alone graph neural network model, and adding a row of codes for calling the data synchronization operator defined in the first step before the codes for the graph traversal calculation operator in the definition code of the original stand-alone graph neural network model.

In step three, each server copies a complete graph neural network model, initializes all parameters of the model by the stand-alone graph neural network framework of the server 1, and synchronizes the parameters on all servers to the parameter values on the server 1, so that the parameter values on all servers are equal.

In the fourth step, graph division is carried out, wherein a node and the incoming edge of the node are selected to be divided into the same machines, nodes A and D and the incoming edges of the nodes are obtained by division on the server 1, the nodes A and the incoming edges are connected with the edges of the nodes C, the nodes C are added into a local node set of the server 1, the nodes A and the D are exclusive nodes of the server 1, and the nodes C are external nodes of the server 1; the server 2 is divided to obtain nodes B, C and E and their incoming edges, because B and A are connected with each other by edges, A is entered into the node set of the server 2, B, C and E are exclusive nodes of the server 2, and A is an external node of the server 2.

In step five, forward propagation of the single machine model logic: the single-machine graph neural network framework on each server carries out calculation logic as a single machine on a seen sub-graph, the sub-graph seen by the server 1 comprises nodes A, D and C and incoming edges of the nodes A and D, and the sub-graph seen by the server 2 comprises incoming edges of the nodes B, C, E and A and B, C and E. When the computational logic of the single computer is executed to the data synchronization operator, executing a sixth step; otherwise, until the forward propagation is finished, executing step seven.

In step six, the forward propagation logic of the data synchronization operator: the server 1 synchronizes the latest data of the external node C from the server 2 to cover the old local value; server 2 synchronizes the latest data of node a from server 1, overwriting the old values of the locality. And returning to the step five.

In step seven, each server selects a part of the dedicated nodes to perform loss calculation and summation, the server 1 selects to perform loss calculation on the dedicated node a, and the server 2 selects to perform loss calculation on the dedicated nodes B and E.

In step eight, the automatic gradient backpropagation of the single machine model logic: the single-machine neural network framework on each server carries out the same reverse propagation gradient solving logic as the single machine, the reverse propagation is automatically executed by the single-machine neural network framework, and when the reverse propagation of the data synchronization operator is executed, the single-machine neural network framework automatically calls the logic of the execution step nine; otherwise, step ten is executed until the back propagation is finished.

In step nine, the data synchronization operator counter-propagates the automatic gradient synchronization logic: the server 1 sends the gradient of the external node C to the server 2, the local gradient of the node C is set to be 0, and after the server 2 receives the gradient of the node C, the received gradient and the local gradient are added to be used as the gradient of the node C; the server 2 sends the gradient of the external node a to the server 1, sets the local gradient of the node a to 0, and after receiving the gradient of the node a, the server 1 adds the received gradient and the local gradient to obtain the gradient of the node a.

In the step ten, the stand-alone image neural network frameworks on all the servers carry out parameter gradient synchronization, the parameters are updated, the step five is returned to carry out iteration of the next round of training until the model parameters are converged, and the training is finished.

It will be appreciated by those skilled in the art that, in addition to the implementation of the system and its various means, modules and units provided by the present invention as pure computer readable program code, the implementation of the system and its various means, modules and units provided by the present invention in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like can be achieved by entirely programming the method steps. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

It should be noted that the prior art in the protection scope of the present invention is not limited to the examples given in the present application, and all the prior art which is not inconsistent with the technical scheme of the present invention, including but not limited to the prior patent documents, the prior publications and the like, can be included in the protection scope of the present invention.

In addition, the combination of the features in the present application is not limited to the combination described in the claims of the present application or the combination described in the embodiments, and all the features described in the present application may be freely combined or combined in any manner unless contradictory to each other occurs.

It should also be noted that the above-listed embodiments are only specific embodiments of the present invention. It is apparent that the present invention is not limited to the above embodiments and similar changes or modifications thereto which can be directly or easily inferred from the disclosure of the present invention by those skilled in the art are intended to be within the scope of the present invention.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for expanding stand-alone graph neural network training to distributed training is characterized by comprising the following steps:

the method comprises the following steps: registering data synchronization operation into an operator of a single-machine graph neural network framework;

step two: modifying the code of the single-diagram neural network model, and adding the calling code of the data synchronization operator defined in the step one before traversing the calculation operators of all the diagrams in the single-diagram neural network model;

step four: initializing model parameters;

step six: forward propagation logic of the data synchronization operator: each server synchronizes the latest values of the nodes from other servers and returns to the fifth step;

step ten: synchronizing the parameter gradient and updating the parameters of the stand-alone image neural network framework on all the servers, returning to the step five to perform iteration of the next round of training until the model parameters are converged, and finishing the training;

in the third step, the graph division divides each node and all edges in a certain direction corresponding to the nodes onto one of the servers, and the nodes are called exclusive nodes; after the division is finished, each server also includes nodes which are connected with the dedicated nodes to which the server belongs in an edge mode but do not belong into the local node set of the server, the nodes are called external nodes, and the dedicated nodes and the external nodes jointly form the local node set of the server.

2. The method of claim 1, wherein in step two, all data synchronization operations are performed by calling a data synchronization operator, and a row of codes for calling the data synchronization operator is added before all codes for traversing the computation operator with respect to the graph in the original computation logic.

3. The method according to claim 1, wherein in the fourth step, each server copies a complete graph neural network model, initializes the parameters of the model by the single graph neural network framework of one of the servers, and synchronizes the parameters of all the servers to the values of the parameters of the one of the servers, so that the values of the parameters of all the servers are equal.

4. The method as claimed in claim 1, wherein in the fifth step, the stand-alone neural network framework on each server uses a graph formed by dedicated nodes and relevant edges of the server and external nodes of the server as the graph to be calculated by the server.

5. The method for training an augmented standalone neural network to distributed training as claimed in claim 1, wherein in step six, each server will synchronize, at each external node local to it, the latest value of the external node from the server to which the external node belongs.

6. The method as claimed in claim 1, wherein in step seven, each server selects a part of nodes in its dedicated node set for loss calculation, so that the loss of the same node is not calculated repeatedly by different servers.

7. The method according to claim 1, wherein in the ninth step, each server sends the gradient at each local external node back to the server to which the external node belongs, and sets the gradient at the local external node to 0; after each server receives the gradient of the exclusive node sent by other servers, the gradient received by each exclusive node is added with the gradient obtained by the server through single-machine logic calculation to obtain the final gradient of each exclusive node.

8. A system for extending a stand-alone neural network training to a distributed training, the system comprising: a plurality of computers and data, each computer having one or more computing nodes, the system for extended standalone graph neural network training to distributed training performing the method for extended standalone graph neural network training to distributed training of any of claims 1-7.

9. A computer-readable storage medium storing a computer program, wherein the computer program when executed is configured to implement the method for extended standalone neural network training to distributed training of any of claims 1-7.