WO2022057310A1 - 一种图神经网络训练的方法、装置及系统 - Google Patents
一种图神经网络训练的方法、装置及系统 Download PDFInfo
- Publication number
- WO2022057310A1 WO2022057310A1 PCT/CN2021/096588 CN2021096588W WO2022057310A1 WO 2022057310 A1 WO2022057310 A1 WO 2022057310A1 CN 2021096588 W CN2021096588 W CN 2021096588W WO 2022057310 A1 WO2022057310 A1 WO 2022057310A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vertices
- training
- graph
- vertex
- partition
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 304
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 97
- 238000005192 partition Methods 0.000 claims description 185
- 238000010586 diagram Methods 0.000 claims description 50
- 238000011156 evaluation Methods 0.000 claims description 41
- 238000003860 storage Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 abstract description 8
- 230000000875 corresponding effect Effects 0.000 description 74
- 230000008569 process Effects 0.000 description 34
- 235000008694 Humulus lupulus Nutrition 0.000 description 17
- 230000006870 function Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000001364 causal effect Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present application relates to the field of computer technology, and in particular to a method, device and system for training a graph neural network.
- relational graphs such as: social network graphs, user-commodity graphs, knowledge graphs and protein structure diagram, etc.
- the data of these relational graphs will be applied to the training of graph neural networks (GNNs), so that the trained GNNs can be used to reason about causal relationships in relational graphs of this type, for example: a certain type of user Trending products or suitable groups of products of a certain type, etc.
- GNN is a multi-layer neural network running on graph-structured data, in which each layer of neural network is aggregated and updated centered on the vertex.
- Aggregation By collecting the feature information of neighbor vertices, and using aggregation operations such as accumulation and averaging to obtain aggregate information that fuses neighbor vertex information.
- Update Pass the aggregated information through a fully connected layer such as a fully connected layer to generate a new output as the input of the feature information of the next layer of graph neural network.
- the relationship graphs in many scenarios are very large. Often a relationship graph consists of hundreds of millions of vertices and more than 100 billion edges.
- GNN often uses graph sampling for training. . That is, the GNN is placed on multiple training execution devices, and the GNN on each training execution device is the same, but the GNNs on different training execution devices are trained using data on different vertices in the relationship graph, and then fused, so that It is realized that the training process of GNN is completed through the entire relationship diagram, and an applicable GNN model is obtained, and the training execution device may be a graphics processing unit (graphic processing unit, GPU).
- graphics processing unit graphics processing unit
- DGL Deep Graph Library
- CPU central processing unit
- GPU graphics processing unit
- CPU central processing unit
- Each GPU GPU as a training execution device needs to request a subgraph for training by accessing the entire relational graph in the shared memory, and the central processing unit (CPU) responds
- the request of each GPU allocates a training subgraph for each GPU, which results in a long time-consuming loading of data from the CPU to the GPU, and a relatively high overhead in the entire training process.
- the embodiments of the present application provide a method for training a graph neural network, which can improve the training efficiency of the graph neural network.
- Embodiments of the present application also provide corresponding apparatuses, systems, computer-readable storage media, computer program products, and the like.
- a first aspect of the present application provides a method for training a graph neural network, including: acquiring a first relational graph for training a graph neural network, where the first relational graph includes multiple vertices and multiple edges, wherein each edge is used for Two vertices are connected, and the multiple vertices include training vertices for training the graph neural network; N different second relation graphs are determined according to the first relation graph, the second relation graph is a subgraph of the first relation graph, and N is The number of training execution devices, N is an integer greater than 1; wherein, the difference between the numbers of training vertices included in any two second relational graphs is less than a preset threshold, and the second relational graph includes the neighbors of the training vertices Vertex; send the information of the N second relationship graphs to the N training execution devices, the N training execution devices are in one-to-one correspondence with the N second relationship graphs, and the N second relationship graphs are respectively used for the corresponding training execution device training graphs Neural Networks.
- the method can be applied to a central device of a graph neural network training system
- the graph neural network training system can be a distributed system or a parallel system
- the graph neural network training system further includes a plurality of training execution devices
- the The central device can be an independent physical machine, a virtual machine (VM), a container or a central processing unit (CPU)
- the training execution device can be an independent physical machine, VM, container, graphics processor ( Graphic processing unit, GPU), field-programmable gate array (field-programmable gate array, FPGA) or special purpose chip.
- Graph neural networks are applicable to a wide range of fields, such as graph-based recommendation systems (e-commerce or social relations, etc.), and can also be applied to various scenarios such as traffic management or chemistry.
- GNN is used for processing graph data, that is, for processing relational graphs. Relationship diagrams are used to represent the connections between entities in the real world, such as friend relationships in social relationships, relationships between consumers and commodities in e-commerce, and so on.
- the first relational graph is a relational graph used for graph neural network training. Each vertex in the first relational graph corresponds to a sample data, and each edge represents the connection between the sample data.
- the vertices directly connected by an edge have Direct association relationship, this direct association relationship can also be called “one-hop connection”, on the contrary, two vertices connected through other vertices have indirect association relationship, this indirect association relationship can also be called “multi-hop connection” .
- the second relational graph is divided from the first relational graph.
- the central device will divide N second relational graphs from the first relational graph according to the number of training execution devices.
- the vertices in each second relational graph may be There will be partial overlap, but not complete overlap.
- the vertices in the second relationship graph have a direct relationship (one-hop relationship), or have both a direct relationship and an indirect relationship (multi-hop relationship). Multi-hop in this application includes two or more than two hops.
- the neighbor vertices of the training vertices in this application may also be training vertices.
- the graph neural network included in the training execution device in this application refers to a graph neural network model (GNN Model).
- the central device when dividing the first relational graph, not only considers the number of vertices that should be divided in each second relational graph to be as equal as possible, but also tries to divide the neighbor vertices of the training vertex into the same second relational graph.
- the relationship diagram in this way, the calculation balance in each training execution device is achieved, and the process of reading the sample data of the relevant neighbor vertices from the training execution device to other training execution devices frequently during the training process of the graph neural network is reduced. , reducing the network overhead across training execution devices and improving the training efficiency of the graph neural network.
- the above step determining N different second relationship graphs according to the first relationship graph, including: according to the evaluation score of each of the N partitions corresponding to the target vertex, assigning the target vertex The vertex, and the multiple neighbor vertices of the target vertex are divided into the partition with the highest evaluation score of the target vertex, the target vertex is a training vertex in the first relational graph, and the evaluation score is used to indicate the target vertex and each partition before assigning the target vertex
- the evaluation score of the neighbor vertex set includes multiple neighbor vertices of the target vertex.
- the target vertex may be any training vertex in the first relational graph.
- a set composed of multiple neighbor vertices can be called a neighbor vertex set, that is to say, the neighbor vertex set includes all the neighbor vertices of the target vertex.
- the training vertices in the first relational graph may be divided into training vertices one by one in a polling manner.
- a partition is configured for each training execution device, and these partitions may be located in the storage space of the central device or the storage space corresponding to the central device.
- the training vertices are divided into each partition one by one, and which partition each training vertex is divided into needs to be determined according to the evaluation score.
- the vertices in each partition can be formed into the second relational graph.
- the neighbor vertices included in the neighbor vertex set can be divided into two cases.
- neighbor vertices is a multi-hop relationship, except for the vertices that are directly connected to the training vertices through an edge, such as: starting from the target vertex, the vertices reached by "two hops" or "three hops" can be attributed to the target vertex. Multiple neighbor vertices. Typically, the number of hops between neighbor vertices and the target vertex is less than a threshold, such as less than five hops. However, in some implementations of the present application, vertices whose hops from a target vertex are less than or equal to a value of a specified hop information can be used as neighbor vertices of the target vertex by specifying the hop information.
- the correlation degree represents the proportion of the adjacent vertices whose vertices have been allocated in each partition as the target vertex
- the evaluation score is a numerical index, which reflects the closeness between the target vertex and the allocated vertices in the partition through a specific value.
- Degree that is, the proportion of neighbor vertices of the target vertex contained in the partition.
- the higher the evaluation score the higher the proportion of the neighbor vertices of the target vertex among the allocated vertices in the partition, and the more suitable the target vertex is to be divided into the partition. For the vertex that has been assigned the highest score in the neighbor vertex set, it is not necessary to repeat the assignment.
- the vertices with high correlation are allocated to the same partition, and then belong to the same second relational graph, which can effectively avoid the frequent occurrence of cross-training execution device scheduling of associated vertices during the training process.
- Network overhead for data can effectively avoid the frequent occurrence of cross-training execution device scheduling of associated vertices during the training process.
- the method further includes: acquiring multiple neighbor vertices of the target vertex according to hop count information, where the hop count information indicates a path from the target vertex to each of the multiple neighbor vertices. Maximum number of edges.
- the hop number information refers to the one-hop relationship or the multi-hop relationship described above. If the hop count information is 1, it means that the neighbor vertex set includes vertices that have a direct relationship with the target vertex. If the hop count information is 2, it means that the neighbor vertex set includes vertices that have a direct relationship with the target vertex. In addition, it also includes the vertices connected by the vertices of the direct association relationship. Similarly, if the hop count information is 3, it means that the neighbor vertex set can also include vertices associated with the target vertex through three hops, and other hops. Number information can be deduced in the same way.
- the hop count information indicates the hop count of the farthest vertex, that is, the maximum number of edges included in a single path formed from the target vertex to each neighbor vertex. It can be seen from this possible implementation that the vertices can be allocated through the control of hop information, which is beneficial to improve the orientation requirements of the graph neural network, such as: finding the best friend and finding the most interesting products for users.
- the evaluation score of the target vertex in the first partition is positively related to the coincidence number of the first partition, and the coincidence number of the first partition is used to indicate a plurality of neighbor vertices and the first partition.
- the number of allocated vertex coincidences, the first partition is any one of N partitions.
- the vertex coincidence in this application refers to a vertex that is the same as the vertex that has been allocated in the first partition in a plurality of neighbor vertices.
- some or all of the neighbor vertices of the multiple neighbor vertices of the target vertex may have been allocated to the partition.
- the set of neighbor vertices will have overlapping vertices with the allocated vertices in the partition.
- the relevance of the target vertex to the partition can be determined by the number of coincident vertices. The larger the number of coincidences, the higher the correlation between the target vertex and the partition.
- This method of determining the evaluation score based on the number of coincidences between the neighbor vertices and the assigned vertices can effectively divide the vertices with high correlation into one partition, thereby effectively avoiding the frequent occurrence of cross-training execution device scheduling correlation during the training process.
- the evaluation score of the first partition is the product of the coincidence number of the first partition and the balance ratio of the first partition, and the balance ratio is used to indicate the probability that the target vertex is divided into the first partition
- the balance ratio is the ratio of the first difference to the number of vertices after adding multiple neighbor vertices to the first partition
- the first difference is the pre-configured upper limit of the number of vertices in the first partition and the allocated vertices in the first partition difference in quantity.
- the balance ratio is the ratio of the first difference to the number of vertices after the first partition is added to the set of neighbor vertices
- the first difference is the pre- The difference between the configured upper limit value of the number of vertices in the first partition and the number of vertices allocated in the first partition. Then, the evaluation score of the first partition corresponding to the target vertex is determined according to the product of the coincidence number of the multiple neighbor vertices and the allocated vertices in the first partition and the equalization ratio corresponding to the first partition.
- the balance ratio represents the difference between the number of allocated vertices in the first partition and the pre-configured upper limit of the number of vertices in the first partition, and the difference between the number of vertices in each partition after adding the neighbor vertex set
- the ratio of the number of vertices The larger the number of vertices allocated in the partition, the smaller the ratio. From the perspective of storage balance, it means that the partition is not suitable for adding too many vertices. Therefore, the equalization ratio is introduced on the coincidence number, and the calculation equalization and storage equalization are more comprehensively considered.
- the out-degrees of the vertices in the N second relational graphs satisfy the first preset condition, and the out-degrees represent the number of edges connected to one vertex.
- the out-degree higher than the first preset condition will be prioritized. Vertices are placed in the second relational graph, and vertices whose out-degree is less than the first preset condition may be discarded.
- the first preset condition may be pre-configured or dynamically generated, and may be a specific value.
- the method further includes: sending, to the training execution device, sample data corresponding to vertices whose out-degree satisfies the second preset condition in the second relationship graph, where the out-degree indicates that a vertex is connected to the number of sides.
- the vertices with larger out-degrees may be prioritized, that is
- the sample data of the vertices that are frequently used in the training process are sent to the training execution device.
- the sample data for the vertices whose out-degree is small (does not meet the second preset condition), that is, will not be frequently used, can be stored on the central device. If the central device is a CPU, these sample data can be stored in the corresponding vertices of the CPU.
- the second preset condition may be to sort according to the out-degrees of the vertices in the second relationship graph, and then combine the storage space of the training execution device to preferentially send the sample data of the top vertices to the training execution device, until the training execution device has The available storage space has reached the limit.
- the second preset condition may also be a preset threshold value, and there may be various settings for the second preset condition, which are not specifically limited in this application.
- the second preset condition may be the same as the first preset condition, or may be different from the first preset condition, and usually the second preset condition is higher than the first preset condition.
- the cache space of the training execution device is used in an out-degree-first manner, which can effectively improve the cache hit rate and reduce the time-consuming of loading sample data.
- the method further includes: receiving information indicating the available buffer space sent by the training execution device; and determining, according to the information indicating the available buffer space, that the degree satisfies the second preset Set the vertex of the condition.
- the training execution device may first perform a round of testing, and through the test, the size of the available cache space that can be used to cache sample data can be determined, and then the size of the available cache space is sent to the central device, and the center The device can then determine the vertex whose out-degree satisfies the second preset condition.
- a second aspect of the present application provides a method for training a graph neural network, comprising: receiving information of a second relational graph obtained from a first relational graph, where the first relational graph includes multiple vertices and multiple edges, wherein each The edge is used to connect two vertices with a direct relationship, the multiple vertices include training vertices used to train the graph neural network, and the second relationship graph contains neighbor vertices that have a target relationship with the training vertices; according to the second relationship graph information, call the sample data corresponding to the vertices in the second relational graph; train the graph neural network according to the sample data.
- the method of the second aspect can be applied to the training execution device of the graph neural network training system. From the introduction of the graph neural network training system in the first aspect above, it can be known that the graph neural network is trained on the training execution device, and the central device can start from the first N second relation graphs are determined in the relational graph, and each training execution device corresponds to a second relational graph. After each training execution unit trains the graph neural network, the target graph neural network for reasoning (which can also be described as target graph neural network model).
- the target association relationship of the vertices in the second relational graph received by the training execution device refers to that the vertices in the second relational graph have a direct relational relation (one-hop relation), or have both a direct relational relation and an indirect relational relation Relationships (multi-hop relationships).
- Multi-hop in this application includes two or more than two hops.
- the second relational graph includes training vertices and their corresponding neighbor vertices, so that during the training of the graph neural network, it is not necessary to frequently cross the training execution device to other training execution devices to read the sample data of the relevant neighbor vertices. In the process, the network overhead of the cross-training execution device is reduced, and the training efficiency of the graph neural network is improved.
- the method further includes: receiving sample data corresponding to vertices whose out-degree satisfies the second preset condition in the second relational graph; locally caching the out-degree satisfying the second preset condition
- the above steps according to the second relationship graph, calling the sample data corresponding to the vertices in the second relationship graph, including: scheduling the sample data corresponding to the vertices whose degree satisfies the second preset condition from the local cache ; schedule sample data corresponding to vertices whose degrees do not satisfy the second preset condition from the central device.
- the vertices with larger out-degrees may be prioritized, that is
- the sample data of the vertices that are frequently used in the training process are sent to the training execution device.
- the sample data for the vertices whose out-degree is small (does not meet the second preset condition), that is, will not be frequently used, can be stored on the central device. If the central device is a CPU, these sample data can be stored in the corresponding vertices of the CPU.
- the sample data can be stored on that server's hard disk or in memory.
- the sample data corresponding to the vertex with the smaller out-degree is called from the central device.
- the second preset condition may be to sort according to the out-degrees of the vertices in the second relationship graph, and then combine the storage space of the training execution device to preferentially send the sample data of the top vertices to the training execution device, until the training execution device has The available storage space has reached the limit.
- the second preset condition may also be a preset threshold value, and there may be various settings for the second preset condition, which are not specifically limited in this application.
- the cache space of the training execution device is used in an out-degree-first manner, which can effectively improve the cache hit rate and reduce the time-consuming of loading sample data.
- the method further includes: performing a round of testing on the graph neural network to determine an available buffer space for storing sample data; sending a data indicating the available buffer space to the central device information, the information of the available buffer space is used to instruct the central device to send out the sample data corresponding to the vertices whose degree satisfies the second preset condition.
- a third aspect of the present application provides an apparatus for training a graph neural network, where the apparatus has the function of implementing the method of the first aspect or any possible implementation manner of the first aspect.
- This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions, such as an acquisition unit, a processing unit and a sending unit.
- a fourth aspect of the present application provides an apparatus for training a graph neural network, the apparatus having the function of implementing the method of the second aspect or any possible implementation manner of the second aspect.
- This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions, such as a receiving unit, a first processing unit and a second processing unit.
- a fifth aspect of the present application provides a computer device comprising at least one processor, a memory, an input/output (I/O) interface, and a computer executable stored in the memory and executable on the processor Instructions, when the computer-executed instructions are executed by the processor, the processor executes the method as described above in the first aspect or any possible implementation manner of the first aspect.
- a sixth aspect of the present application provides a computer device comprising at least one processor, a memory, an input/output (I/O) interface, and a computer executable stored in the memory and executable on the processor Instructions, when the computer-executed instructions are executed by the processor, the processor executes the method of the second aspect or any of the possible implementations of the second aspect.
- a seventh aspect of the present application provides a computer-readable storage medium storing one or more computer-executable instructions.
- the processor executes the first aspect or any one of the possible first aspects. method of implementation.
- An eighth aspect of the present application provides a computer-readable storage medium that stores one or more computer-executable instructions.
- the processor executes the second aspect or any one of the possible operations of the second aspect. method of implementation.
- a ninth aspect of the present application provides a computer program product that stores one or more computer-executable instructions.
- the processor executes the first aspect or any possible implementation manner of the first aspect. Methods.
- a tenth aspect of the present application provides a computer program product that stores one or more computer-executable instructions.
- the processor executes the second aspect or any possible implementation manner of the second aspect. Methods.
- An eleventh aspect of the present application provides a chip system, where the chip system includes at least one processor, and the at least one processor is configured to implement the functions involved in the first aspect or any possible implementation manner of the first aspect.
- the chip system may further include a memory, which is used for storing necessary program instructions and data of the apparatus for training the graph neural network.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- a twelfth aspect of the present application provides a chip system, where the chip system includes at least one processor, and the at least one processor is configured to implement the functions involved in the second aspect or any possible implementation manner of the second aspect.
- the chip system may further include a memory, which is used for storing necessary program instructions and data of the apparatus for training the graph neural network.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- a thirteenth aspect of the present application provides a distributed system, where the distributed system includes a central device and multiple training execution devices, wherein the central device is configured to execute the first aspect or any possible implementation of the first aspect.
- the central device is configured to execute the first aspect or any possible implementation of the first aspect.
- any one of the plurality of training execution apparatuses is configured to execute the method of the second aspect or any possible implementation manner of the second aspect.
- the central device when dividing the first relational graph, not only considers that the number of vertices to be divided in each second relational graph is as equal as possible, but also tries to divide the neighbor vertices of the training vertex into the same second relational graph In the figure, in this way, the calculation balance in each training execution device is achieved, and the process of reading the sample data of the relevant neighbor vertices from the training execution device to other training execution devices frequently during the graph neural network training process is reduced. The network overhead across training execution devices is reduced, and the training efficiency of the graph neural network is improved.
- FIG. 1 is a schematic diagram of an embodiment of a distributed system provided by an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a server provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of an embodiment of a method for training a graph neural network provided by an embodiment of the present application
- FIG. 4 is a schematic diagram of an example of a first relationship diagram provided by an embodiment of the present application.
- 5A is a schematic diagram of an example of a second relationship diagram provided by an embodiment of the present application.
- FIG. 5B is another exemplary schematic diagram of a second relationship diagram provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of an example scenario of a graph partition provided by an embodiment of the present application.
- Fig. 7A is an experimental effect comparison diagram provided by the embodiment of the present application.
- Fig. 8 is another experimental effect comparison diagram provided by the embodiment of the present application.
- Fig. 9 is a scene example diagram of a graph neural network training provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of an embodiment of an apparatus for training a graph neural network provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of an embodiment of an apparatus for training a graph neural network provided by an embodiment of the present application
- FIG. 14 is a schematic diagram of an embodiment of a computer device provided by an embodiment of the present application.
- the embodiments of the present application provide a method for training a graph neural network, which can improve the training efficiency of the graph neural network.
- Embodiments of the present application also provide corresponding apparatuses, systems, computer-readable storage media, computer program products, and the like. Each of them will be described in detail below.
- relational graphs such as: social network graphs, user-commodity graphs, and knowledge graphs and protein structure diagrams.
- a relational graph usually includes multiple vertices and multiple edges, each edge connects two vertices, and the two vertices connected by the same edge have a direct relationship.
- each relational graph has many vertices (data samples), where each vertex has a different number of adjacent vertices, leading to some important operations (eg: convolution) on the image It is easy to calculate, but not suitable for direct use in relational graphs.
- a core of existing deep neural network learning algorithms is that data samples are independent of each other.
- each vertex in the graph will have edges related to other vertices in the graph, and the information of these edges can be used to capture the interdependencies between vertices representing different entities.
- the entities represented by the vertices can be users and products, so that the products that users like or the users that the products are suitable for can be inferred through the edges.
- GNNs graph neural networks
- the GNN is used to process graph data, that is, to deal with relational graphs.
- GNN is applicable to a wide range of fields, such as: graph-based recommender systems (e-commerce or social relations, etc.).
- the GNN is also applicable to others, such as: each sensor on the road is a vertex in the relation graph, the edges are represented by the distances of pairs of vertices above a threshold, and each vertex contains a time series as a feature.
- the goal is to predict the average speed of a road in a time interval, and it can also be applied to taxi demand forecasting, which helps intelligent transportation systems use resources efficiently and save energy.
- the GNN can also be applied in chemistry, such as: using GNN to study the graph structure of molecules. In this graph structure, atoms are vertices in the graph, and chemical bonds are edges in the graph.
- Graph classification and graph generation are the main tasks of molecular graphs, and they can be used to learn molecular fingerprints, predict molecular properties, infer protein structures, and synthesize chemical compounds.
- GNNs have also been explored for other applications such as: program verification, program reasoning, social impact prediction, adversarial attack prevention, electronic health record modeling, brain networks, event detection or combinatorial optimization, etc.
- Developers can build an initial GNN according to application requirements, and then train the initial GNN using the relationship graph corresponding to the corresponding requirements, and then obtain a target GNN suitable for the requirements, and then use the target GNN for corresponding reasoning.
- the graph neural network training system for the process of training GNN can be a distributed system or a parallel system, and the distributed system or parallel system can be an architecture as shown in FIG. 1, which includes a central device and a plurality of training execution devices, such as : training execution device 1 to training execution device N, where N is an integer greater than 1.
- Each training execution device is loaded with an initial GNN, and the central device or a corresponding storage device (eg, a disk or memory corresponding to a CPU in a server) stores a first relational graph for training the initial GNN.
- the central device may determine N different second relationship diagrams according to the first relationship diagram, and then send the N different second relationship diagrams to the N training execution devices.
- the N training execution devices can The initial GNN on the training execution device can be trained using the respective second relationship diagrams. After each training execution device has trained the respective initial GNNs, the GNNs trained by the N training execution devices can be trained by one training execution device or by the central device. Fusion to obtain the target GNN.
- the graph neural network included in the training execution device in this application refers to a graph neural network model (GNN Model).
- the central device can be an independent physical machine, a virtual machine (VM), a container or a central processing unit (CPU), and the training execution device can be an independent physical machine, VM, container, graphics processor (graphic processing unit, GPU), field-programmable gate array (field-programmable gate array, FPGA) or special purpose chip.
- VM virtual machine
- CPU central processing unit
- FPGA field-programmable gate array
- the central device can be the CPU of the server hardware part
- the training execution device can be the GPU of the server hardware part, such as GPU1 to GPU(N).
- the GPU Parts can also be implemented by FPGA or dedicated chips.
- the part implemented by software in the embodiments of the present application may include a graph partition and a graph cache, wherein the graph partition may be implemented by a CPU, the graph cache may be implemented by a GPU, and the GPU part may also be implemented by an FPGA or a dedicated chip.
- the first relational graph can be stored in the memory or disk of the server.
- the CPU obtains the first relational graph from the memory or the disk, and then determines N second relational graphs according to the first relational graph, and then Send the information of the N second relational graphs to N GPUs respectively.
- the sample data corresponding to each vertex in the first relational graph can also be stored in memory or disk, or can be sent to the GPU according to the storage space of the GPU. Sample data corresponding to some vertices in the second relational graph.
- the graph neural network training system includes multiple servers, and each server includes multiple GPUs, the multiple servers can perform distributed training, and each GPU in each server can perform parallel training.
- the process of training a graph neural network may include the following two parts: first, the central device divides the first relational graph into second relational graphs; second, the samples are cached in the training execution device according to the out-degree priority strategy data. They are introduced separately below.
- the central device divides the first relational graph into second relational graphs.
- an embodiment of the method for training a graph neural network provided by the embodiment of the present application includes:
- the central device acquires a first relational graph for training a graph neural network.
- the first relational graph includes a plurality of vertices and a plurality of edges, wherein each edge is used to connect two vertices, and the two vertices connected by the same edge have a direct relationship, that is, each edge is used to connect the two vertices with a direct relationship.
- the two vertices of the relationship includes a plurality of vertices and a plurality of edges, wherein each edge is used to connect two vertices, and the two vertices connected by the same edge have a direct relationship, that is, each edge is used to connect the two vertices with a direct relationship.
- Vertices 1 to 18 in FIG. 4 are just examples, and the number of vertices in an actual relational graph usually has thousands or even hundreds of millions of vertices. The number of edges is usually tens of thousands or even hundreds of millions.
- vertices in the first relational graph of the present application have an association relationship, some are a direct association relationship (one-hop relationship) connected by the same edge, and some are an indirect association relationship (multi-hop relationship) transited by common connected vertices, such as : Vertex 1 and Vertex 5 directly generate an indirect relationship through Vertex 3 or Vertex 6.
- a one-hop relationship is directly connected through an edge, and a multi-hop relationship refers to transiting through other vertices, and at least two edges can be used to connect.
- the central device determines N different second relationship diagrams according to the first relationship diagram.
- the second relational graph is a subgraph of the first relational graph, N is the number of training execution devices, and N is an integer greater than 1; wherein, the difference between the numbers of training vertices included in any two second relational graphs is less than A preset threshold is set, and the second relation graph includes neighbor vertices of the training vertices.
- Training vertices refer to vertices that participate in the training of graph neural networks.
- the vertices in the second relationship graph have a direct association relationship (one-hop relationship), or have both a direct association relationship and an indirect association relationship (multi-hop relationship).
- Multi-hop in this application includes two or more than two hops.
- the second relational graph is divided from the first relational graph.
- the central device will divide N second relational graphs from the first relational graph according to the number of training execution devices.
- the vertices in each second relational graph may be There will be overlapping parts.
- FIG. 5A it includes 10 vertices from vertices 1 to 8, vertices 11 and 12, and edges between these 10 vertices.
- FIG. 5B 10 vertices from vertex 7 to vertex 18 are included, and edges between these 10 vertices are included.
- 5A and 5B can include three types of vertices: one is a training vertex for training, such as vertex 1 to vertex 3, vertex 5 and vertex 6 in FIG. 5A , vertex 8, vertex 6 in FIG. 5B 9. Vertex 13, Vertex 14, Vertex 16 to Vertex 18.
- One is a verification vertex for verification, such as vertex 4, vertex 7, and vertex 12 in FIG.
- FIG. 5A such as vertex 10, vertex 11, and vertex 15 in FIG. 5B.
- FIG. 5A and FIG. 5B it can be seen from FIG. 5A and FIG. 5B that there are individual duplicate vertices in the two second relational graphs, and these redundant vertices in different second relational graphs can be understood as mirror vertices, which can avoid frequent cross-partition accesses. .
- the number of training vertices in each second relational graph is basically the same, which can ensure a balanced calculation.
- frequent access across training execution devices can be avoided by redundant vertices, which can improve training efficiency.
- each vertex in the second relationship graph has a direct relationship.
- One-hop refers to a direct connection. For example, if vertex 1 and vertex 3 are directly connected, it is a one-hop relationship.
- Multi-hop refers to an indirect connection, such as vertex 1 to vertex 5, which needs to be connected through vertex 3 or vertex 6, and requires two hops to get from vertex 1 to vertex 5, which requires two or more hops. It is called a multi-hop relationship.
- the out-degree of the vertices in the second relational graph satisfies the first preset condition, and the out-degree represents the number of edges connected to a vertex. Others, vertices whose out-degree is less than the first preset condition may be discarded.
- the first preset condition may be pre-configured or dynamically generated, and may be a specific value. For example, the first preset condition is that the out-degree is greater than 50. Of course, this is just an example.
- the central device sends the information of the N second relationship graphs to the N training execution devices, and correspondingly, the training execution device receives the information of the second relationship graph.
- the information of the second relational graph may be a summary or metadata of the second relational graph, and the information of the second relational graph includes the identifiers of the vertices in the second relational graph and the relationship between the vertices.
- the N training execution devices are in one-to-one correspondence with the N second relationship diagrams, and the N second relationship diagrams are respectively used for neural network training of the corresponding training execution device diagrams.
- the training vertices in the first relational graph may be polled one by one to determine the second relational graph.
- the storage space of the central device maps a corresponding partition for each training execution device.
- the training vertices are firstly divided into corresponding partitions, and after all the training vertices are divided, according to the training vertices in each partition and the relationship between the training vertices in each partition in the first relational graph, form Second relationship diagram.
- the above-mentioned process of determining the second relational graph according to the first relational graph may include: according to the evaluation score of each partition in the N partitions corresponding to the target vertex, dividing the target vertex and multiple neighbor vertices of the target vertex into the highest evaluation score of the target vertex
- the partitions of the A partition corresponds to one training execution apparatus, and after each training vertex in the first relational graph is allocated, the vertices in each division are included in the second relational graph of the training execution apparatus corresponding to the division.
- the process of determining the second relational graph according to the first relational graph can also be described as: determining multiple neighbor vertices of the target vertex in the first relational graph; determining the target vertex according to the multiple neighbor vertices and the assigned vertices in the N partitions Corresponds to the evaluation score of each of the N partitions, and the evaluation score is used to indicate the correlation between the target vertex and the vertices that have been allocated in each partition before assigning the target vertex; according to the target vertex corresponding to the evaluation of each of the N partitions score, divide the target vertex and multiple neighbor vertices into the partition with the highest evaluation score, wherein each of the N partitions corresponds to a training execution device, after each training vertex in the first relational graph is assigned , the vertices in each partition are included in the second relational graph of the training execution device of the corresponding partition.
- the target vertex may be any training vertex in the first relational graph.
- a set composed of multiple neighbor vertices can be called a neighbor vertex set, that is to say, the neighbor vertex set includes all the neighbor vertices of the target vertex.
- a plurality of neighbor vertices of the target vertex may be obtained according to the hop count information, where the hop count information indicates the number of edges in the path from the target vertex to the corresponding neighbor vertex.
- the number of coincidences between the set of neighbor vertices and the vertices allocated to the first partition in the N partitions can be determined, and the number of coincidences of the first partition is used to indicate the number of coincidences between the set of neighbor vertices and the vertices allocated in the first partition, and the number of coincidences in the first partition is any one of the N partitions, and the number of coincidences is positively correlated with the degree of correlation; according to the number of coincidences between the set of neighbor vertices and the assigned vertices in the first partition, determine the evaluation score of each partition in the N partitions corresponding to the target vertex.
- the evaluation score of the first partition of the target vertex is positively related to the coincidence number of the first partition, which is used to indicate the number of coincidences between the set of neighbor vertices and the assigned vertices in the first partition, the first The partition is any one of N partitions.
- the vertex coincidence in this application refers to a vertex that is the same as the vertex that has been allocated in the first partition in a plurality of neighbor vertices.
- the definition of the neighbor vertices is a one-hop relationship, only the vertices with a direct relationship connected by an edge can belong to the neighbor vertex set. If the definition of the neighbor vertices is a multi-hop relationship, except for the vertices with a direct relationship , such as: starting from the target vertex, the vertices reached by "two hops" or "three hops" can belong to multiple neighbor vertices of the target vertex. For example, in a social relationship, for a target user, friends of the target user can be found through a one-hop relationship, and friends of the target user's friends can be found through a two-hop relationship. Hop count information is a numerical description of neighbor relationships.
- the hop count information is 1, it means that the neighbor vertex set includes vertices that have a direct relationship with the target vertex. If the hop count information is 2, it means that the neighbor vertex set includes vertices that have a direct relationship with the target vertex. In addition, it also includes the vertices connected by the vertices of the direct association relationship. Similarly, if the hop count information is 3, it means that the neighbor vertex set can also include vertices associated with the target vertex through three hops, and other hops. Number information can be deduced in the same way. When a target vertex has multiple neighbor vertices, the hop count information indicates the hop count of the farthest vertex, that is, the maximum number of edges included in a single path formed from the target vertex to each neighbor vertex.
- the set of neighbor vertices that can be determined includes ⁇ vertex 1, vertex 2, vertex 4 and vertex 5 ⁇
- the correlation degree represents the proportion of the assigned vertices in each partition as the neighbor vertices of the target vertex.
- the evaluation score is a numerical indicator, and a specific value is used to reflect the closeness of the target vertex and the assigned vertices in the partition, that is The proportion of neighbor vertices of the target vertex contained in the partition is high or low. The higher the evaluation score, the higher the proportion of the neighbor vertices of the target vertex among the allocated vertices in the partition, and the more suitable the target vertex is to be divided into the partition.
- the set of neighbor vertices can be determined. Includes ⁇ vertex 1, vertex 2, vertex 4, vertex 5, vertex 6, vertex 7, and vertex 12 ⁇ . In this way, there are two vertices in the set of neighbor vertices that coincide with the vertices allocated in the first partition, and only one vertex in the second partition coincides with the vertices that have been allocated in the partition. It can be seen that the set of neighbor vertices is the same as the vertex of the first partition.
- the coincidence number is higher than the coincidence complex with the second partition, which also means that the correlation degree of the neighbor vertex set with the vertices in the first partition is higher than the correlation degree with the second partition.
- the evaluation score is a numerical indicator, and a specific value reflects the closeness of the target vertex to the allocated vertices in the partition, that is, the proportion of the neighbor vertices of the target vertex contained in the partition. The higher the evaluation score, the higher the proportion of the neighbor vertices of the target vertex among the allocated vertices in the partition, and the more suitable the target vertex is to be divided into the partition. For the vertex that has been assigned the highest score in the neighbor vertex set, it is not necessary to repeat the assignment.
- the vertices with high correlation are allocated to the same partition, and then belong to the same second relational graph, which can effectively avoid the frequent occurrence of cross-training execution device scheduling of associated vertices during the training process.
- Network overhead for data can effectively avoid the frequent occurrence of cross-training execution device scheduling of associated vertices during the training process.
- the above-mentioned process of determining the evaluation score can also consider the balanced distribution of vertices simultaneously.
- a balanced ratio is used.
- the balanced ratio is used to indicate the probability that the target vertex is divided into the first partition.
- the balanced ratio is the first difference and the first partition.
- the evaluation score of the first partition corresponding to the target vertex can be determined according to the product of the coincidence number of the multiple neighbor vertices and the allocated vertices in the first partition and the equilibrium ratios corresponding to the first partition.
- the evaluation score of the first partition is the product of the coincidence number of the first partition and the balance ratio of the first partition
- the balance ratio is used to indicate the probability that the target vertex is divided into the first partition
- the balance ratio is the first difference and the first partition.
- TV i represents the set of allocated vertices in the ith partition.
- V t represents the set of neighbor vertices of the target vertex V t , that is, multiple neighbor vertices, and the target vertex is the training vertex.
- PV i is used to control the storage balance, which is the total number of vertices that have been allocated to the i-th partition, including the added neighbor vertices.
- TV avg is the desired number of vertices per partition.
- the present application can set the value of TV avg as where N represents the number of partitions, and
- the first relational graph is represented by G
- the total number of vertices in the first relational graph is represented by TV
- the first relational graph G is used as the input, given the value of the number of hops L, TV , and the value of partition N, you can start from the empty set for each partition.
- a second relational graph is formed, that is, the second relational graph ⁇ G 1 , G 2 , . . . , G N ⁇ shown in FIGS. 5A and 5B .
- the training execution apparatus invokes the sample data corresponding to the vertices in the second relational graph according to the information of the second relational graph.
- the sample data involves different types of application data according to the application requirements of the graph neural network.
- the sample data in e-commerce, can be consumer data and commodity data, and in social relations, the sample data can be data with Information about users who are friends.
- the sample data in chemistry, can be molecules or atoms.
- the sample data may be stored in the memory or hard disk of the central device, or may be cached in the cache of the training execution device.
- the training execution device trains the graph neural network according to the sample data.
- the central device when dividing the first relational graph, not only considers that the number of vertices to be divided in each second relational graph is as equal as possible, but also tries to divide the neighbor vertices of the training vertex into the same second relational graph In the figure, in this way, the calculation balance in each training execution device is achieved, and the process of reading the sample data of the relevant neighbor vertices from the training execution device to other training execution devices frequently during the graph neural network training process is reduced. The network overhead across training execution devices is reduced, and the training efficiency of the graph neural network is improved.
- the embodiments of the present application conduct experiments on a single GPU and multiple GPUs respectively.
- the acceleration effect of a single accelerator is shown.
- the training speed of the solution provided by the embodiment of the present application is compared with the training process of the existing deep graph library (DGL).
- this application (PaGraph) has achieved 1.6-4.8 times training on datasets 1 to 6 (these 6 datasets can be reddit, wiki-talk, livejournal, lj-link, lj-large and enwiki in turn) Efficiency improvements.
- Figure 8 shows the speedup ratio under multiple GPUs.
- the proposed solution has higher throughput and training acceleration ratio.
- the application can achieve a super-linear speed-up ratio due to the introduction of the cache mechanism.
- a speed-up ratio of 4.9 times can be achieved in the case of 4 accelerators compared to a single accelerator.
- one training set is used as an example for illustration, and the overall trend in other training sets is the same as that in Fig. 8, and the specific numerical values are slightly different.
- the cache of the training execution device is usually very limited, especially if the training execution device is a GPU, an FPGA or a dedicated chip, in this case, the training execution device usually cannot store the sample data corresponding to each vertex in the second relational graph.
- the central device sends sample data corresponding to the vertices whose out-degree satisfies the second preset condition in the second relational graph to the training execution device, where the out-degree represents the number of edges connected to a vertex.
- the training execution device receives the sample data corresponding to the vertices whose out-degree satisfies the preset condition in the second relationship graph; locally caches the sample data corresponding to the vertices whose out-degree satisfies the preset condition.
- the training execution device may first perform a round of testing to determine the available buffer space for storing the sample data; and send the information for indicating the available buffer space to the central device, the amount of the available buffer space.
- the information is used to instruct the central device to send out sample data corresponding to vertices whose degrees satisfy the second preset condition.
- the sample data corresponding to the vertices whose out-degree satisfies the second preset condition can be scheduled from the local cache; the sample data corresponding to the vertex whose out-degree satisfies the second preset condition can be scheduled; The sample data corresponding to the vertices of the condition.
- the vertices with larger out-degree can be prioritized, that is, they will be frequently used in the training process.
- the sample data of the vertices reached is sent to the training vertices.
- the sample data for vertices whose out-degree is small does not meet the second preset condition), that is, the vertices that are not frequently used, can be stored in the central device. Call the sample data corresponding to the vertex with the smaller out-degree.
- the second preset condition may be to sort according to the out-degrees of the vertices in the second relational graph, and then combine the storage space of the training execution device to send the sample data of the top vertices to the training execution device in priority, until the training execution device has The available storage space has reached the limit.
- the second preset condition may also be a preset threshold value, and the setting of the second preset condition may be various, which is not specifically limited in this application.
- the second preset condition in this application may be the same as the first preset condition, or may be different from the first preset condition.
- vertex 3 vertex 5, vertex 8, ..., vertex 102 and vertex 421 are selected, wherein the sample data corresponding to vertex 5 and vertex 421 are not cached on the training execution device, and need to be obtained from the central device , and then train the GNN.
- the present application adds a caching mechanism for the training execution device on the basis of graph partitioning, and adopts an out-degree-priority caching method, that is, caching the sample data corresponding to frequently accessed vertices in the GPU memory.
- an out-degree-priority caching method that is, caching the sample data corresponding to frequently accessed vertices in the GPU memory.
- FIG. 10 shows a set of experimental data, in which PaGraph represents the hit rate of the out-degree-first caching strategy of the present application.
- Optimal represents the hit rate of the theoretical optimal caching strategy determined by analyzing the access behavior afterwards.
- Random represents the hit rate of the random cache strategy.
- AliGraph indicates the hit rate of the caching strategy adopted by AliGraph. It can be seen from FIG. 10 that the hit rate (Cache Hit Ration) of the cache strategy of the present application is almost close to the hit rate of the most theoretical cache strategy. Compared with the random strategy and AliGraph's caching strategy, it has been significantly improved.
- 40% of the vertices are cached data (Cached Data)
- the hit rate is more than twice that of AliGraph; the training performance is 1.4 times that of AliGragh.
- the hit rate in this application refers to the probability that a cached vertex is selected for GNN training.
- an embodiment of the apparatus 30 for training a graph neural network provided by the embodiment of the present application includes:
- the obtaining unit 301 is configured to obtain a first relational graph used for training a graph neural network.
- the first relational graph includes a plurality of vertices and a plurality of edges, wherein each edge is used to connect two vertices, and the plurality of vertices include a plurality of vertices. Training vertices for training graph neural networks.
- the processing unit 302 is configured to determine N different second relational graphs according to the first relational graph obtained by the obtaining unit 301, where the second relational graph is a subgraph of the first relational graph, N is the number of training execution devices, and N is greater than An integer of 1; wherein, the difference between the numbers of training vertices included in any two second relational graphs is less than a preset threshold, and the second relational graph includes neighbor vertices of the training vertices.
- the sending unit 303 is configured to send the N second relationship diagrams determined by the processing unit 302 to the N training execution devices, the N training execution devices are in one-to-one correspondence with the N second relationship diagrams, and the N second relationship diagrams are respectively used for The corresponding training execution means trains the graph neural network.
- the processing unit 302 is used to divide the target vertex, and multiple neighbor vertices of the target vertex into the partition with the highest evaluation score of the target vertex according to the evaluation score of each partition in the corresponding N partitions of the target vertex.
- the evaluation score is used to indicate the correlation between the target vertex and the vertices allocated in each partition before the target vertex is allocated, wherein each partition in the N partitions corresponds to a training execution device , after each training vertex in the first relational graph is allocated, the vertices in each partition are included in the second relational graph of the training execution device of the corresponding partition.
- the processing unit 302 is configured to acquire multiple neighbor vertices of the target vertex according to hop count information, where the hop count information indicates the maximum number of edges in a path from the target vertex to each of the multiple neighbor vertices.
- the evaluation score of the target vertex in the first partition is positively related to the number of coincidences in the first partition, and the number of overlaps in the first partition is used to indicate the number of overlaps between multiple neighbor vertices and assigned vertices in the first partition.
- a partition is any one of N partitions.
- the evaluation score of the first partition is the product of the coincidence number of the first partition and the balance ratio of the first partition
- the balance ratio is used to indicate the probability that the target vertex is divided into the first partition
- the balance ratio is the first difference and the first partition. The ratio of the number of vertices after adding multiple neighbor vertices to the first partition, where the first difference is the difference between the preconfigured upper limit of the number of vertices in the first partition and the number of vertices allocated in the first partition.
- the out-degrees of the vertices in the N second relational graphs satisfy the first preset condition, and the out-degrees represent the number of edges connected to one vertex.
- the sending unit 303 is further configured to send, to the training execution device, sample data corresponding to vertices whose out-degree satisfies the second preset condition in the second relationship graph, where the out-degree represents the number of edges connected to a vertex.
- the obtaining unit 301 is further configured to receive the information sent by the training execution apparatus and used to indicate the available buffer space.
- the processing unit 302 is configured to determine vertices whose degrees satisfy the second preset condition according to the information used to indicate the available buffer space.
- the apparatus 30 for training a graph neural network described above can be understood by referring to the corresponding descriptions in the foregoing method embodiments, which will not be repeated here.
- FIG. 13 is a schematic diagram of an embodiment of an apparatus for training a graph neural network according to an embodiment of the present application.
- an embodiment of the apparatus 40 for training a graph neural network provided by the embodiment of the present application includes:
- the receiving unit 401 is configured to receive information of a second relational graph obtained from a first relational graph, where the first relational graph includes a plurality of vertices and a plurality of edges, wherein each edge is used to connect two Vertices, the plurality of vertices include training vertices for training the graph neural network, and the second relational graph includes neighbor vertices that have a target relationship with the training vertices.
- the first processing unit 402 is configured to call the sample data corresponding to the vertices in the second relational graph according to the information of the second relational graph received by the receiving unit 401;
- the second processing unit 403 is configured to train a neural network according to the sample data called by the first processing unit 402 .
- the vertices in the second relational graph are all vertices with a target association relationship.
- the network overhead of the cross-training execution device is reduced, and the training efficiency of the graph neural network is improved.
- the receiving unit 401 is further configured to receive sample data corresponding to vertices whose out-degree satisfies the second preset condition in the second relationship graph.
- the storage unit 404 is configured to locally cache the sample data corresponding to the vertices whose degrees satisfy the second preset condition.
- the first processing unit 402 is configured to schedule sample data corresponding to vertices whose degrees satisfy the second preset condition from the local cache; and schedule sample data corresponding to vertices whose degrees do not satisfy the second preset condition from the central device.
- the second processing unit 403 is further configured to perform a round of testing on the graph neural network to determine the available buffer space for storing sample data.
- the apparatus 40 may further include a sending unit, the sending unit is configured to send the information for indicating the available buffer space to the central apparatus, and the information of the available buffer space is used for instructing the central apparatus to send the information corresponding to the vertices whose degree satisfies the second preset condition. sample.
- the apparatus 40 for training a graph neural network described above can be understood by referring to the corresponding description in the foregoing method embodiment section, and details are not repeated here.
- FIG. 14 is a schematic diagram of a possible logical structure of the computer device 50 provided by the embodiment of the present application.
- the computer equipment 50 may be a central device or a training execution device. It can also be a distributed system including a central device and a training execution device.
- the computer device 50 includes: a processor 501 , a communication interface 502 , a memory 503 and a bus 504 .
- the processor 501 , the communication interface 502 and the memory 503 are connected to each other through a bus 504 .
- the processor 501 is used to control and manage the actions of the computer device 50.
- the processor 501 is used to execute steps 101, 102, and 104 and 105 in the method embodiment of FIG. 3.
- the communication interface 502 is used to support the computer device 50 to communicate.
- the memory 503 is used for storing program codes and data of the computer device 50 .
- the processor 501 may include a central processing unit (CPU), a graphics processing unit (GPU), and the processor 501 may also be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array, or other Programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
- the processor 501 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the bus 504 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is shown in FIG. 14, but it does not mean that there is only one bus or one type of bus.
- a computer-readable storage medium is also provided, where computer-executable instructions are stored in the computer-readable storage medium.
- the processor of the device executes the computer-executable instructions
- the device executes the above-mentioned FIG. 3 to The method of graph neural network training of FIG. 11 .
- a computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when a processor of a device executes the computer-executable instructions , the device executes the graph neural network training method in the above-mentioned FIG. 3 to FIG. 11 .
- a chip system is also provided, where the chip system includes a processor, and the processor is configured to implement the above-mentioned methods for training a graph neural network in FIG. 3 to FIG. 11 .
- the chip system may further include a memory, which is used for storing necessary program instructions and data of the device for inter-process communication.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-readable storage medium.
- the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods in the embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种图神经网络训练的方法,应用于分布式系统或并行系统,该方法包括:中心装置获取第一关系图,根据第一关系图确定N个不同的第二关系图,第二关系图为第一关系图的子图,其中,任意两个第二关系图中各自所包括的训练顶点的数量的差值小于预设阈值,且第二关系图中包含训练顶点的邻居顶点;向N个训练执行装置发送N个第二关系图的信息,进而执行对图神经网络的训练。该方法中,不仅各第二关系图中训练顶点的数量基本相当,而且,训练顶点及其对应的邻居顶点基本被划分到同一个第二关系图中,这样,既做到了各训练执行装置的计算均衡,也减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
Description
本申请要求于2020年9月15日提交中国专利局、申请号为202010970736.7、发明名称为“一种图神经网络训练的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机技术领域,具体涉及一种图神经网络训练的方法、装置及系统。
真实世界的数据常常以图的方式进行组织,其中的实体联系蕴含了很强的因果关系,这些具备因果关系的图可以统称为关系图,如:社交网络图、用户商品关系图、知识图谱和蛋白质结构图等。这些关系图的数据会应用到对需要图神经网络(graph neural networks,GNN)的训练中,这样,训练好的GNN就可以用于推理该类型关系图中的因果关系,例如:某类型的用户倾向性的商品或某类型的商品的适用人群等。GNN是一种运行在图结构化数据上的多层神经网络,其中每一层神经网络,以顶点为中心进行聚合和更新。聚合:通过收集邻居顶点的特征信息,并采用如:累加和平均等的聚合操作获得一个融合邻居顶点信息的聚合信息。更新:将聚合信息通过如全连接层产生新的输出,作为下一层图神经网络的特征信息的输入。
现实世界中,很多场景中的关系图非常大,经常一个关系图由上亿个顶点和超过上千亿条边组成,面向这种大规模关系图的场景,GNN往往使用图采样的方式进行训练。也就是将GNN放在多个训练执行装置上,每个训练执行装置上的GNN相同,但不同训练执行装置上的GNN采用关系图中的不同顶点上的数据进行训练,然后再进行融合,从而实现通过整个关系图完成对GNN的训练过程,得到能够应用的GNN模型,该训练执行装置可以是图形处理器(graphic processing unit,GPU)。
深度图库(deep graph library,DGL)是专门为图神经网络训练定制的开源框架,支持大规模图采样的训练方式。DGL在图采样训练过程中,将整个关系图的全图数据结构和图数据特征信息存放在中央处理单元(central processing unit,CPU)的共享内存(graph store)中。作为训练执行装置的每个图形处理器(graphic processing unit,GPU)都需要通过访问共享内存中的整个关系图的方式来请求用于训练的子图,中央处理单元(central processing unit,CPU)响应每个GPU的请求为每个GPU分配训练子图,从而导致数据从CPU加载到GPU耗时较长,在整个训练过程的开销比较大。
发明内容
本申请实施例提供一种图神经网络训练的方法,可以提高图神经网络的训练效率。本申请实施例还提供了相应的装置、系统、计算机可读存储介质以及计算机程序产品等。
本申请第一方面提供一种图神经网络训练的方法,包括:获取用于图神经网络训练的第一关系图,第一关系图包括多个顶点和多条边,其中,每条边用于连接两个顶点,多个顶点中包括用于训练图神经网络的训练顶点;根据第一关系图确定N个不同的第二关系图,第二关系图为第一关系图的子图,N为训练执行装置的数量,N为大于1的整数;其中,任意两个第二关系图中各自所包括的训练顶点的数量的差值小于预设阈值,且第二关系图中包 含训练顶点的邻居顶点;向N个训练执行装置发送N个第二关系图的信息,N个训练执行装置与N个第二关系图一一对应,N个第二关系图分别用于对应的训练执行装置训练图神经网络。
该第一方面中,该方法可以应用于图神经网络训练系统的中心装置,该图神经网络训练系统可以为分布式系统或并行系统,该图神经网络训练系统还包括多个训练执行装置,该中心装置可以是独立的物理机,虚拟机(virtual machine,VM)、容器或中央处理单元(central processing unit,CPU),该训练执行装置可以是独立的物理机、VM、容器、图形处理器(graphic processing unit,GPU)、现场可编程门阵列(field-programmable gate array,FPGA)或专用芯片。
图神经网络(graph neural networks,GNN)所适用的领域非常广泛,如:基于图的推荐系统(电子商务或社交关系等),还可以适用于交通管理或化学等多种场景中。GNN用于处理图数据,也就是用于处理关系图。关系图用于表示真实世界中各实体之间的联系,如:社交关系中的朋友关系,电子商务中的消费者与商品之间的关系等。第一关系图是用于图神经网络训练的关系图,该第一关系图中的每个顶点对应一个样本数据,每条边表示各样本数据之间的联系,通过一条边直接相连的顶点具有直接关联关系,这种直接关联关系也可以称为“一跳连接”,相对的,通过其他顶点实现连接的两个顶点具有间接关联关系,这种间接关联关系也可以称为“多跳连接”。
第二关系图是从第一关系图中划分出来的,中心装置会根据训练执行装置的数量,从第一关系图中划分出N个第二关系图,每个第二关系图中的顶点可能会有部分重合,但不会完全重合。第二关系图中的顶点具有直接关联关系(一跳关系),或者,既具有直接关联关系,又具有间接关联关系(多跳关系)。本申请中的多跳包括两跳,或两跳以上。
本申请中训练顶点的邻居顶点也可以是训练顶点。
本申请中训练执行装置中所包含的图神经网络指的是图神经网络模型(GNN Model)。
由上述第一方面可知,中心装置在划分第一关系图时,不仅考虑了各第二关系图中应划分的顶点的数量尽量相当,而且,尽量将训练顶点的邻居顶点划分到同一个第二关系图中,这样,既做到了各训练执行装置中的计算均衡,也减少了在图神经网络训练过程中需要频繁跨训练执行装置到其他训练执行装置上去读取相关邻居顶点的样本数据的过程,减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
在第一方面的一种可能的实现方式中,上述步骤:根据第一关系图确定N个不同的第二关系图,包括:根据目标顶点对应N个分区中每个分区的评估分数,将目标顶点,以及目标顶点多个邻居顶点划分到目标顶点的评估分数最高的分区中,目标顶点为第一关系图中的一个训练顶点,评估分数用于指示目标顶点与在分配目标顶点之前每个分区中已分配的顶点的相关度,其中,N个分区中每个分区对应一个训练执行装置,在第一关系图中的每个训练顶点都被分配后,每个分区中的顶点被包括在对应分区的训练执行装置的第二关系图内。
该种可能的实现方式中,也可以先确定第一关系图中目标顶点的多个邻居顶点,根据邻居顶点集合和N个分区中已分配的顶点,确定目标顶点对应N个分区中每个分区的评估分数,该邻居顶点集合包括该目标顶点的多个邻居顶点。
本申请中,只对顶点进行分区,不改变用于连接两个顶点的边,也就是说对边不做填 加、删除或改动。
该种可能的实现方式中,目标顶点可以是第一关系图中的任意一个训练顶点。多个邻居顶点所组成的集合可以称为邻居顶点集合,也就是说邻居顶点集合中包括的都是目标顶点的邻居顶点。第一关系图中的训练顶点可以是采用轮询的方式逐个训练顶点进行划分的。在划分第一关系图中的训练顶点之前,会先针为每个训练执行装置各自配置一块分区,这些分区可以位于中心装置的存储空间或中心装置对应的存储空间中。然后,从第一个训练顶点开始,逐个将训练顶点划分到各个分区中,各训练顶点具体划分到哪个分区中就需要根据评估分数来确定。第一关系图中所有训练顶点都划分完毕后,就可以将每个分区中的顶点形成第二关系图。
本申请中,邻居顶点集合所包含的邻居顶点可以分为两种情况。一种为:与训练顶点通过一条边直接相连的顶点称为邻居顶点,也就是对邻居顶点的定义是一跳关系,则只有通过一条边与训练顶点相连的顶点才能归属于邻居顶点集合。另一种为:除了前面的一跳关系外,还包括与训练顶点要通过其他一个或多个顶点中转,通过至少两条边才能连接到训练顶点的顶点也可以称为邻居顶点,也就是对邻居顶点的定义是多跳关系,除了与训练顶点通过一条边直接相连的顶点外,如:从目标顶点开始通过“两跳”或“三跳”所到达的顶点都可以归属于该目标顶点的多个邻居顶点。通常情况下,邻居顶点和目标顶点之间的跳数小于一个阈值,例如小于五跳。而本申请的一些实现中,可以通过指定跳数信息的方式,将距离一个目标顶点的跳数小于和等于某一指定的跳数信息的值的顶点作为该目标顶点的邻居顶点。
本申请中,相关度表示各分区中已分配的顶点为目标顶点的邻居顶点的比重,评估分数为一个数值化的指标,通过一个具体数值来反应该目标顶点与分区中已分配的顶点的紧密程度,也就是该分区中所包含的目标顶点的邻居顶点的比重高低。评估分数越高则表示该分区中已分配的顶点中包含目标顶点的邻居顶点的比重越高,该目标顶点越适合划分到该分区。针对邻居顶点集合中已分配到该评分分数最高的顶点,可以不需要再重复分配。由该种可能的实现方式可知,将相关度高的顶点分配到同一个分区,然后归属于同一个第二关系图,这样可以有效避免在训练过程中频繁发生跨训练执行装置调度相关联顶点的数据的网络开销。
在第一方面的一种可能的实现方式中,该方法还包括:根据跳数信息获取目标顶点的多个邻居顶点,跳数信息指示从目标顶点到多个邻居顶点中每个顶点的路径中边的最大数量。
该种可能的实现方式中,跳数信息指的是前述所描述的一跳关系或多跳关系。若跳数信息是1,则表示邻居顶点集合中包括的是与目标顶点具有直接关联关系的顶点,若跳数信息是2,则表示邻居顶点集合中除了包括与目标顶点具有直接关联关系的顶点外,还包括通过直接关联关系的顶点相连接的顶点,同理,若跳数信息是3,则表示该邻居顶点集合中还可以包括通过三跳与该目标顶点相关联的顶点,其他的跳数信息可以以此类推。一个目标顶点的邻居顶点有多个时,跳数信息指示的是最远的顶点的跳数,也就是从目标顶点到各邻居顶点所形成的单个路径中包含的边的最大数量。由该种可能的实现方式可知,可以通 过跳数信息的控制来分配顶点,有利于提高图神经网络的定向需求,如:查找最好的朋友,查找用户最感兴趣的商品。
在第一方面的一种可能的实现方式中,目标顶点在第一分区的评估分数与第一分区的重合数正相关,第一分区的重合数用于指示多个邻居顶点和第一分区中已分配的顶点重合的数量,第一分区为N个分区中的任意一个。
该种可能的实现方式中,也可以先描述为:确定邻居顶点集合和N个分区的第一分区中已分配的顶点的重合数,根据邻居顶点集合与每个分区中已分配的顶点的重合数,确定目标顶点对应N个分区中每个分区的评估分数。
本申请中的顶点重合指的是多个邻居顶点中存在与第一分区中已分配的顶点相同的顶点。
该种可能的实现方式中,目标顶点的多个邻居顶点中的部分或全部邻居顶点可能都已分配到分区中,这样,该邻居顶点集合就会与分区中已分配的顶点存在重合的顶点,可以通过重合顶点的数量来确定目标顶点与该分区的相关度。重合数越大,说明该目标顶点与该分区的相关度越高。该种通过邻居顶点与已分配的顶点的重合数来确定评估分数的方式,可以有效的将相关度高的顶点划分到一个分区中,进而有效避免在训练过程中频繁发生跨训练执行装置调度相关联顶点的数据的网络开销。
在第一方面的一种可能的实现方式中,第一分区的评估分数为第一分区的重合数与第一分区的均衡比的乘积,均衡比用于指示目标顶点划分到第一分区的概率,均衡比为第一差值与第一分区加入多个邻居顶点后的顶点数量的比值,第一差值为预先配置的第一分区的顶点数量上限值与第一分区中已分配的顶点数量的差值。
该种可能的实现方式中,也可以先描述为:确定第一分区的均衡比,均衡比为第一差值与第一分区加入邻居顶点集合后的顶点数量的比值,第一差值为预先配置的第一分区的顶点数量上限值与第一分区中已分配的顶点数量的差值。然后,根据多个邻居顶点与第一分区中已分配的顶点的重合数,以及对应第一分区的均衡比的乘积,确定目标顶点对应的第一分区的评估分数。
该种可能的实现方式中,均衡比表示的是第一分区中已分配的顶点的数量与预先配置的第一分区的顶点数量上限值的差距,再与每个分区加入邻居顶点集合后的顶点数量的比值,该分区中已分配的顶点数量越多,则该比值越小,从存储均衡的角度考虑,说明该分区中不适合再分入过多的顶点了。因此,在重合数上再引入均衡比,更全面的考虑到了计算均衡和存储均衡。
在第一方面的一种可能的实现方式中,N个第二关系图中的顶点的出度满足第一预设条件,该出度表示一个顶点所连接的边的数量。
该种可能的实现方式中,考虑到有些情况下划分到每个分区中的顶点数量很多,而中心装置的存储空间可能有限,这种情况下会优先将出度高于第一预设条件的顶点放置到第二关系图中,出度小于第一预设条件的顶点可能会被放弃。第一预设条件可以是预先配置的,也可以是动态生成的,可以是一个具体的数值。
在第一方面的一种可能的实现方式中,该方法还包括:向训练执行装置发送第二关系 图中出度满足第二预设条件的顶点对应的样本数据,出度表示一个顶点所连接的边的数量。
该种可能的实现方式中,考虑到训练执行装置的存储空间有限,当第二关系图上的顶点数量较多时,可以优先将出度较大(满足第二预设条件)的顶点,也就是在训练过程中会被频繁使用到的顶点的样本数据发送给训练执行装置。针对出度较小(不满足第二预设条件),也就是不会被频繁使用的顶点的样本数据可以存储在中心装置上,若中心装置是CPU,这些样本数据可以存储在于该CPU对应的磁盘或内存中,如果CPU位于服务器上,则该样本数据就可以存储在该服务器的硬盘或内存中。在使用到该出度较小的顶点时,再从中心装置调用该出度较小的顶点对应的样本数据。第二预设条件可以是根据第二关系图中各顶点的出度进行排序,然后结合训练执行装置的存储空间,优先向训练执行装置发送排序在前的顶点的样本数据,直到训练执行装置的可用存储空间达到上限。该第二预设条件也可以是预先设定的一个门限值,关于该第二预设条件的设定可以有多种,本申请对此不做具体限定。该第二预设条件可以与第一预设条件相同,也可以与第一预设条件不同,通常第二预设条件高于第一预设条件。该种可能的实现方式中,通过出度优先的方式使用训练执行装置的缓存空间,可以有效提升缓存命中率,降低加载样本数据的耗时。
在第一方面的一种可能的实现方式中,该方法还包括:接收训练执行装置发送的用于指示可用缓存空间的信息;根据用于指示可用缓存空间的信息,确定出度满足第二预设条件的顶点。
该种可能的实现方式中,训练执行装置可以先进行一轮测试,通过测试可以确定出能用于缓存样本数据的可用缓存空间的大小,然后将该可用缓存空间的大小发送给中心装置,中心装置就可以确定出出度满足第二预设条件的顶点。
本申请第二方面提供一种图神经网络训练的方法,包括:接收从第一关系图中得到的第二关系图的信息,第一关系图包括多个顶点和多条边,其中,每条边用于连接具有直接关联关系的两个顶点,多个顶点中包括用于训练图神经网络的训练顶点,第二关系图中包含与训练顶点具有目标关联关系的邻居顶点;根据第二关系图的信息,调用第二关系图中的顶点对应的样本数据;根据样本数据训练该图神经网络。
该第二方面的方法可以应用于图神经网络训练系统的训练执行装置,由上述第一方面对图神经网络训练系统的介绍可知,图神经网络在训练执行装置上训练,中心装置可以从第一关系图中确定出N个第二关系图,每个训练执行装置对应一个第二关系图,每个训练执行装置训练图神经网络后,可以得到用于推理的目标图神经网络(也可以描述为目标图神经网络模型)。
训练执行装置所接收到的第二关系图中的顶点的目标关联关系指的是第二关系图中的顶点具有直接关联关系(一跳关系),或者,既具有直接关联关系,又具有间接关联关系(多跳关系)。本申请中的多跳包括两跳,或两跳以上。
由上述第二方面可知,第二关系图包括训练顶点及其对应的邻居顶点,这样在图神经网络训练时不需要频繁跨训练执行装置到其他训练执行装置上去读取相关邻居顶点的样本数据的过程,减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
在第二方面的一种可能的实现方式中,该方法还包括:接收第二关系图中出度满足第 二预设条件的顶点对应的样本数据;在本地缓存出度满足第二预设条件的顶点对应的样本数据;上述步骤:根据第二关系图,调用第二关系图中的顶点对应的样本数据,包括:从本地缓存中调度出度满足第二预设条件的顶点对应的样本数据;从中心装置调度出度不满足第二预设条件的顶点对应的样本数据。
该种可能的实现方式中,考虑到训练执行装置的存储空间有限,当第二关系图上的顶点数量较多时,可以优先将出度较大(满足第二预设条件)的顶点,也就是在训练过程中会被频繁使用到的顶点的样本数据发送给训练执行装置。针对出度较小(不满足第二预设条件),也就是不会被频繁使用的顶点的样本数据可以存储在中心装置上,若中心装置是CPU,这些样本数据可以存储在于该CPU对应的磁盘或内存中,如果CPU位于服务器上,则该样本数据就可以存储在该服务器的硬盘或内存中。在使用到该出度较小的顶点时,再从中心装置调用该出度较小的顶点对应的样本数据。第二预设条件可以是根据第二关系图中各顶点的出度进行排序,然后结合训练执行装置的存储空间,优先向训练执行装置发送排序在前的顶点的样本数据,直到训练执行装置的可用存储空间达到上限。该第二预设条件也可以是预先设定的一个门限值,关于该第二预设条件的设定可以有多种,本申请对此不做具体限定。该种可能的实现方式中,通过出度优先的方式使用训练执行装置的缓存空间,可以有效提升缓存命中率,降低加载样本数据的耗时。
在第二方面的一种可能的实现方式中,该方法还包括:对图神经网络进行一轮测试,以确定用于存储样本数据的可用缓存空间;向中心装置发送用于指示可用缓存空间的信息,可用缓存空间的信息用于指示中心装置发送出度满足第二预设条件的顶点对应的样本数据。
本申请第三方面提供一种图神经网络训练的装置,该装置具有实现上述第一方面或第一方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块,例如:获取单元、处理单元和发送单元。
本申请第四方面提供一种图神经网络训练的装置,该装置具有实现上述第二方面或第二方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块,例如:接收单元、第一处理单元和第二处理单元。
本申请第五方面提供一种计算机设备,该计算机设备包括至少一个处理器、存储器、输入/输出(input/output,I/O)接口以及存储在存储器中并可在处理器上运行的计算机执行指令,当计算机执行指令被处理器执行时,处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。
本申请第六方面提供一种计算机设备,该计算机设备包括至少一个处理器、存储器、输入/输出(input/output,I/O)接口以及存储在存储器中并可在处理器上运行的计算机执行指令,当计算机执行指令被处理器执行时,处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。
本申请第七方面提供一种存储一个或多个计算机执行指令的计算机可读存储介质,当计算机执行指令被处理器执行时,处理器执行如上述第一方面或第一方面任意一种可能的 实现方式的方法。
本申请第八方面提供一种存储一个或多个计算机执行指令的计算机可读存储介质,当计算机执行指令被处理器执行时,处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。
本申请第九方面提供一种存储一个或多个计算机执行指令的计算机程序产品,当计算机执行指令被处理器执行时,处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。
本申请第十方面提供一种存储一个或多个计算机执行指令的计算机程序产品,当计算机执行指令被处理器执行时,处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。
本申请第十一方面提供了一种芯片系统,该芯片系统包括至少一个处理器,至少一个处理器用于实现上述第一方面或第一方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存图神经网络训练的装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本申请第十二方面提供了一种芯片系统,该芯片系统包括至少一个处理器,至少一个处理器用于实现上述第二方面或第二方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存图神经网络训练的装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本申请第十三方面提供一种分布式系统,该分布式系统包括中心装置和多个训练执行装置,其中,中心装置用于执行上述第一方面或第一方面任意一种可能的实现方式的方法,多个训练执行装置中的任一训练执行装置用于执行如上述第二方面或第二方面任意一种可能的实现方式的方法。
本申请实施例中,中心装置在划分第一关系图时,不仅考虑了各第二关系图中应划分的顶点的数量尽量相当,而且,尽量将训练顶点的邻居顶点划分到同一个第二关系图中,这样,既做到了各训练执行装置中的计算均衡,也减少了在图神经网络训练过程中需要频繁跨训练执行装置到其他训练执行装置上去读取相关邻居顶点的样本数据的过程,减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
图1是本申请实施例提供的分布式系统的一实施例示意图;
图2是本申请实施例提供的服务器的一结构示意图;
图3是本申请实施例提供的图神经网络训练的方法的一实施例示意图;
图4是本申请实施例提供的第一关系图的一示例示意图;
图5A是本申请实施例提供的第二关系图的一示例示意图;
图5B是本申请实施例提供的第二关系图的另一示例示意图;
图6是本申请实施例提供的图分区的一场景示例示意图;
图7A是本申请实施例提供的一实验效果对比图;
图7B是本申请实施例提供的另一实验效果对比图;
图8是本申请实施例提供的另一实验效果对比图;
图9是本申请实施例提供的一图神经网络训练的一场景示例图;
图10是本申请实施例提供的另一实验效果对比图;
图11是本申请实施例提供的另一实验效果对比图;
图12是本申请实施例提供的图神经网络训练的装置的一实施例示意图;
图13是本申请实施例提供的图神经网络训练的装置的一实施例示意图;
图14是本申请实施例提供的计算机设备的一实施例示意图。
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例提供一种图神经网络训练的方法,可以提高图神经网络的训练效率。本申请实施例还提供了相应的装置、系统、计算机可读存储介质以及计算机程序产品等。以下分别进行详细说明。
随着人工智能(artificial intelligence,AI)的发展,深度神经网络在图像处理、声音识别或语言翻译等各个方面都得到的有效的应用。但真实世界的数据常常以图的方式进行组织,其中的实体联系蕴含了很强的因果关系,这些具备因果关系的图可以统称为关系图,如:社交网络图、用户商品关系图、知识图谱和蛋白质结构图等。关系图通常包括多个顶点和多条边,每条边连接两个顶点,通过同一条边连接的两个顶点具有直接关联关系。
因为关系图是不规则的,每个关系图都有很多个顶点(数据样本),其中,每个顶点都有不同数量的相邻顶点,导致一些重要的操作(如:卷积)在图像上很容易计算,但不适合直接用于关系图。此外,现有深度神经网络的学习算法的一个核心是数据样本之间彼此独立。然而,对于关系图来说,图中的每个顶点都会有边与图中其他顶点相关,这些边的信息可用于捕获表示不同实体的顶点之间的相互依赖关系。如:在电子商务中,顶点所代表的实体可以是用户和商品,这样,就可以通过边推测出用户喜欢的商品,或者商品所适用的用户。
为了适用于关系图,出现了借鉴卷积网络、循环网络和深度自动编码器的思想的图神经网络(graph neural networks,GNN),该GNN用于处理图数据,也就是用于处理关系图。GNN所适用的领域非常广泛,如:基于图的推荐系统(电子商务或社交关系等)。该GNN还适用于其他,如:道路上的每个传感器为关系图中的一个顶点,边由阈值以上成对顶点的距 离表示,每个顶点都包含一个时间序列作为特征。目标是预测一条道路在时间间隔内的平均速度,还可以应用在出租车需求预测,这有助于智能交通系统有效利用资源,节约能源。该GNN还可以应用在化学中,如:用GNN研究分子的图结构。在该图结构中,原子为图中的顶点,化学键为图中的边。图分类和图生成是分子图的主要任务,他们可以用来学习分子指纹、预测分子性质、推断蛋白质结构、合成化合物。
GNN还已被探索可以应用在其他方面,如:程序验证、程序推理、社会影响预测、对抗性攻击预防、电子健康记录建模、脑网络、事件检测或组合优化等。
开发人员可以根据应用需求,构建初始GNN,然后采用相应需求对应的关系图对该初始GNN进行训练,就可以得到适用于该需求的目标GNN,后续就可以使用该目标GNN进行相应推理。
训练GNN的过程的图神经网络训练系统可以是分布式系统或并行系统,该分布式系统或并行系统可以是如图1所示的架构,该架构中包括中心装置和多个训练执行装置,如:训练执行装置1至训练执行装置N,N为大于1的整数。每个训练执行装置上都加载有初始GNN,中心装置或对应的存储设备(如:服务器中CPU对应的磁盘或内存)上存储有用于训练初始GNN的第一关系图。本申请实施例中,中心装置可以根据第一关系图确定N个不同的第二关系图,然后将N个不同的第二关系图发送给N个训练执行装置,这样,N个训练执行装置就可以使用各自的第二关系图训练该训练执行装置上的初始GNN,各个训练执行装置训练好各自的初始GNN后,可以通过一个训练执行装置或者通过中心装置对N个训练执行装置训练的GNN进行融合,从而得到目标GNN。
本申请中训练执行装置中所包含的图神经网络指的是图神经网络模型(GNN Model)。
该中心装置可以是独立的物理机,虚拟机(virtual machine,VM)、容器或中央处理单元(central processing unit,CPU),该训练执行装置可以是独立的物理机、VM、容器、图形处理器(graphic processing unit,GPU)、现场可编程门阵列(field-programmable gate array,FPGA)或专用芯片。
如图2所示,以上述图神经网络训练系统是服务器为例,中心装置可以是服务器硬件部分的CPU,训练执行装置可以是服务器硬件部分的GPU,如GPU1至GPU(N),当然,GPU部分也可以通过FPGA或专用芯片来实现。本申请实施例通过软件实现的部分可以包括图分区和图缓存,其中,图分区可以是CPU来实现,图缓存可以是GPU来实现的,GPU部分也可以通过FPGA或专用芯片来实现。第一关系图可以存储在服务器的内存或磁盘中,在需要做图分区时,CPU从内存或磁盘中获取该第一关系图,然后根据该第一关系图确定N个第二关系图,然后将N个第二关系图的信息分别发送给N个GPU,第一关系图中的每个顶点对应的样本数据还可以存储在内存或磁盘中,也可以根据GPU的存储空间,向GPU发送相应第二关系图中的部分顶点对应的样本数据。
若该图神经网络训练系统包括多个服务器,每个服务器中又包括多个GPU,多个服务器之间可以执行分布式训练,每个服务器中的各GPU可以执行并行训练。
本申请实施例中,图神经网络训练的过程可以包括如下两部分内容:一、中心装置将第一关系图划分为第二关系图;二、按照出度优先的策略在训练执行装置中缓存样本数据。 下面分别进行介绍。
一、中心装置将第一关系图划分为第二关系图。
如图3所示,本申请实施例提供的图神经网络训练的方法的一实施例包括:
101、中心装置获取用于图神经网络训练的第一关系图。
第一关系图包括多个顶点和多条边,其中,每条边用于连接两个顶点,通过同一条边连接的两个顶点且具有直接关联关系,也就是每条边用于连接具有直接关联关系的两个顶点。
第一关系图的结构可以参阅图4进行理解。图4中的顶点1至顶点18只是示例,实际的关系图中顶点的数量通常有成千上万甚至上亿个顶点。边的数量通常也有成千上万甚至上亿条。
本申请第一关系图中的多数顶点都有关联关系,有的是通过同一条边连接的直接关联关系(一跳关系),也有的是通过共同连接的顶点中转的间接关联关系(多跳关系),如:顶点1和顶点5直接就通过顶点3或者顶点6产生了间接关联关系。
一跳关系直接是通过一条边直接相连,多跳关系指的是通过其他顶点中转,至少通过两条边才能关联上。
102、中心装置根据第一关系图确定N个不同的第二关系图。
第二关系图为第一关系图的子图,N为训练执行装置的数量,N为大于1的整数;其中,任意两个第二关系图中各自所包括的训练顶点的数量的差值小于预设阈值,且第二关系图中包含训练顶点的邻居顶点。
训练顶点指的是参与图神经网络训练的顶点。
本申请实施例中,第二关系图中的顶点具有直接关联关系(一跳关系),或者,既具有直接关联关系,又具有间接关联关系(多跳关系)。本申请中的多跳包括两跳,或两跳以上。
第二关系图是从第一关系图中划分出来的,中心装置会根据训练执行装置的数量,从第一关系图中划分出N个第二关系图,每个第二关系图中的顶点可能会有重合部分。
以从第一关系图中划分出两个第二关系图为例,结合上述图4的示例,划分出的两个第二关系图可以参阅图5A和图5B进行理解。
如图5A所示,包括顶点1至顶点8,以及顶点11和顶点12这10个顶点,以及这10个顶点之间的边。如图5B所示,包括顶点7至顶点18这10个顶点,以及这10个顶点之间的边。图5A和图5B中可以包括三种类型的顶点:一种为用于训练的训练顶点,如图5A中的顶点1至顶点3、顶点5和顶点6,如图5B中的顶点8、顶点9、顶点13、顶点14、顶点16至顶点18。一种为用于验证的验证顶点,如图5A中的顶点4、顶点7和顶点12,如图5B中的顶点10、顶点11和顶点15。一种为冗余顶点,如图5A中的顶点8和顶点11,如图5B中的顶点7和顶点12。由该图5A和图5B可以看出,两个第二关系图中存在个别重复的顶点,这些在不同第二关系图中的冗余顶点可以理解为是镜像顶点,可以避免频繁发生跨分区访问。
本申请实施例中,每个第二关系图中训练顶点的数量基本相当,这样可以确保计算均衡,另外,通过冗余顶点的方式可以避免频繁发生跨训练执行装置的访问,可以提高训练效率。
如果在划分第二关系图时,只考虑一跳关系,那么每个第二关系图中的顶点都具有直接关联关系,如果考虑多跳关系,那么第二关系图中的顶点除了直接关联关系,还具有间接关联关系。“一跳”指的是直接连接,如顶点1和顶点3直接连接,则为一跳关系。“多跳”指的是间接连接,如顶点1到顶点5,需要通过顶点3或顶点6连接,需要通过两跳才能从顶点1到顶点5,这种需要通过两跳或更多跳的关系称为多跳关系。
可选地,考虑到有些情况下划分到每个分区中的顶点数量很多,而中心装置的存储空间可能有限,这种情况下会优先将出度高于第一预设条件的顶点放置到第二关系图中,这样,第二关系图中的顶点的出度满足第一预设条件,出度表示一个顶点所连接的边的数量。其他的,出度小于第一预设条件的顶点可能会被放弃。第一预设条件可以是预先配置的,也可以是动态生成的,可以是一个具体的数值,如:第一预设条件为出度大于50,当然这里只是举例。
103、中心装置向N个训练执行装置发送N个第二关系图的信息,对应地,训练执行装置接收第二关系图的信息。
第二关系图的信息可以是第二关系图的摘要或者元数据,该第二关系图的信息中包括第二关系图中顶点的标识,以及各顶点之间的关系。
N个训练执行装置与N个第二关系图一一对应,N个第二关系图分别用于对应的训练执行装置图神经网络训练。
可选地,上述步骤102可以对第一关系图中的训练顶点进行逐个顶点轮询的方式来确定第二关系图,在逐个训练顶点划分的过程中,还没有形成第二关系图,可以在中心装置的存储空间为每个训练执行装置映射一个对应的分区。在训练顶点划分过程中,先将训练顶点划分到相应的分区中,所有训练顶点都划分完毕后,根据各分区中的训练顶点以及各分区中的训练顶点在第一关系图中的关系,形成第二关系图。
上述根据第一关系图确定第二关系图的过程可以包括:根据目标顶点对应N个分区中每个分区的评估分数,将目标顶点,以及目标顶点多个邻居顶点划分到目标顶点的评估分数最高的分区中,目标顶点为第一关系图中的一个训练顶点,评估分数用于指示目标顶点与在分配目标顶点之前每个分区中已分配的顶点的相关度,其中,N个分区中每个分区对应一个训练执行装置,在第一关系图中的每个训练顶点都被分配后,每个分区中的顶点被包括在对应分区的训练执行装置的第二关系图内。
本申请中,只对顶点进行分区,不改变用于连接两个顶点的边,也就是说对边不做填加、删除或改动。
该根据第一关系图确定第二关系图的过程还可以描述为:确定第一关系图中目标顶点的多个邻居顶点;根据多个邻居顶点和N个分区中已分配的顶点,确定目标顶点对应N个分区中每个分区的评估分数,评估分数用于指示目标顶点与在分配目标顶点之前每个分区中已分配的顶点的相关度;根据目标顶点对应N个分区中每个分区的评估分数,将目标顶点,以及多个邻居顶点划分到评估分数最高的分区中,其中,N个分区中每个分区对应一个训练执行装置,在第一关系图中的每个训练顶点都被分配后,每个分区中的顶点被包括在对应分区的训练执行装置的第二关系图内。
本申请实施例中,目标顶点可以是第一关系图中的任意一个训练顶点。多个邻居顶点所组成的集合可以称为邻居顶点集合,也就是说邻居顶点集合中包括的都是目标顶点的邻居顶点。可以是根据跳数信息获取目标顶点的多个邻居顶点,跳数信息指示从目标顶点到对应的邻居顶点的路径中边的数量。然后可以确定邻居顶点集合和N个分区中第一分区已分配的顶点的重合数,第一分区的重合数用于指示邻居顶点集合和第一分区中已分配的顶点重合的数量,第一分区为N个分区中的任意一个,重合数与相关度正相关;根据邻居顶点集合与第一分区中已分配的顶点的重合数,确定目标顶点对应N个分区中每个分区的评估分数。
也就是说:目标顶点的第一分区的评估分数与第一分区的重合数正相关,第一分区的重合数用于指示邻居顶点集合和第一分区中已分配的顶点重合的数量,第一分区为N个分区中的任意一个。
本申请中的顶点重合指的是多个邻居顶点中存在与第一分区中已分配的顶点相同的顶点。
若对邻居顶点的定义是一跳关系,则只有通过一条边相连的具有直接关联关系的顶点才能归属于邻居顶点集合,若对邻居顶点的定义是多跳关系,除了具有直接关联关系的顶点外,如:从目标顶点开始通过“两跳”或“三跳”所到达的顶点都可以归属于该目标顶点的多个邻居顶点。如:在社交关系中,针对目标用户,一跳关系可以查找到该目标用户的朋友,通过两跳关系可以查找到该目标用户的朋友的朋友。跳数信息是对邻居关系的数值化描述。若跳数信息是1,则表示邻居顶点集合中包括的是与目标顶点具有直接关联关系的顶点,若跳数信息是2,则表示邻居顶点集合中除了包括与目标顶点具有直接关联关系的顶点外,还包括通过直接关联关系的顶点相连接的顶点,同理,若跳数信息是3,则表示该邻居顶点集合中还可以包括通过三跳与该目标顶点相关联的顶点,其他的跳数信息可以以此类推。一个目标顶点的邻居顶点有多个时,跳数信息指示的是最远的顶点的跳数,也就是从目标顶点到各邻居顶点所形成的单个路径中包含的边的最大数量。
如图4中所示,若目标顶点是顶点3,跳数信息L=1,则可以确定的邻居顶点集合包括{顶点1、顶点2、顶点4和顶点5},若跳数信息L=2,则可以确定的邻居顶点集合包括{顶点1、顶点2、顶点4、顶点5、顶点6、顶点7和顶点12}。
相关度表示各分区中已分配的顶点为目标顶点的邻居顶点的比重,评估分数为一个数值化的指标,通过一个具体数值来反应该目标顶点与分区中已分配的顶点的紧密程度,也就是该分区中所包含的目标顶点的邻居顶点的比重高低。评估分数越高则表示该分区中已分配的顶点中包含目标顶点的邻居顶点的比重越高,该目标顶点越适合划分到该分区。
如图6所示,若有两个分区,第一分区中已分配了顶点1和顶点2,第二个分区中已分配了顶点7,跳数信息L=2,则可以确定的邻居顶点集合包括{顶点1、顶点2、顶点4、顶点5、顶点6、顶点7和顶点12}。这样,该邻居顶点集合有两个顶点与第一分区中已分配的顶点重合,第二分区中只有一个顶点与该分区中已分配的顶点重合,可见,该邻居顶点集合与该第一分区的重合数高于与第二分区的重合复,也说明,该邻居顶点集合与第一分区中顶点的相关度高于与第二分区的相关度。
评估分数为一个数值化的指标,通过一个具体数值来反应该目标顶点与分区中已分配 的顶点的紧密程度,也就是该分区中所包含的目标顶点的邻居顶点的比重高低。评估分数越高则表示该分区中已分配的顶点中包含目标顶点的邻居顶点的比重越高,该目标顶点越适合划分到该分区。针对邻居顶点集合中已分配到该评分分数最高的顶点,可以不需要再重复分配。由该种可能的实现方式可知,将相关度高的顶点分配到同一个分区,然后归属于同一个第二关系图,这样可以有效避免在训练过程中频繁发生跨训练执行装置调度相关联顶点的数据的网络开销。
上述确定评估分数的过程还可以同步考虑顶点的均衡分配,在确定评估分数的过程中使用均衡比,均衡比用于指示目标顶点划分到第一分区的概率,该均衡比为第一差值与第一分区加入多个邻居顶点后的顶点数量的比值,第一差值为预先配置的第一分区的顶点数量上限值与第一分区中已分配的顶点数量的差值。
这样,就可以根据多个邻居顶点与第一分区中已分配的顶点的重合数,以及对应第一分区的均衡比的乘积,确定目标顶点对应的第一分区的评估分数。
也就是说:第一分区的评估分数为第一分区的重合数与第一分区的均衡比的乘积,均衡比用于指示目标顶点划分到第一分区的概率,均衡比为第一差值与第一分区加入多个邻居顶点后的顶点数量的比值,第一差值为预先配置的第一分区的顶点数量上限值与第一分区中已分配的顶点数量的差值。
上述评估分数可以通过如下公式来表示:
其中,|TV
i|表示第i个分区中已经分配的顶点的集合。
表示目标顶点V
t的邻居顶点集合,也就是多个邻居顶点,该目标顶点是训练顶点。
表示目标顶点的多个邻居顶点与第i个分区中已分配的顶点的重合数。
表示均衡比,PV
i用于控制存储均衡,表示为第i个分区已经分配的顶点总数,包括加入的邻居顶点。TV
avg是每个分区对顶点的期望数量。为了达到计算均衡,本申请可以将该TV
avg的值设为
其中N表示分区的数量,|TV|表示第一关系图中训练顶点的总数量,这样就可以确保每个分区都能获得基本相同数量的训练顶点。这样可以确保各分区的计算均衡。
本申请实施例的上述过程,若第一关系图用G来表示,第一关系图中的顶点总数用TV来表示,以第一关系图G为输入,给定跳数L的取值,TV的取值,以及分区N的值,就可以从每个分区都是空集开始,按照上述对目标顶点的确定分区的方式,先确定邻居顶点集合,然后再计算各分区的评估分数,然后将该目标顶点和对应的邻居顶点集合划分到评估分数最高的分区中,直到将所有顶点都划分到相应的分区中,再根据每个分区中的顶点以及各顶点在第一关系图中的关系,形成第二关系图,也就是如图5A和图5B所示的第二关系图{G
1、G
2,…,G
N}。
104、训练执行装置根据第二关系图的信息,调用所述第二关系图中的顶点对应的样本数据。
该样本数据根据图神经网络的应用需求,涉及不同类型的应用数据,如:在电子商务中,该样本数据可以是消费者的数据和商品的数据,在社交关系中,该样本数据可以是具有朋友关系的用户的信息,在化学中,该样本数据可以是分子或原子。
该样本数据可以存储在中心装置的内存或硬盘上,也可以缓存在训练执行装置的缓存中。
105、训练执行装置根据样本数据训练图神经网络。
本申请实施例中,中心装置在划分第一关系图时,不仅考虑了各第二关系图中应划分的顶点的数量尽量相当,而且,尽量将训练顶点的邻居顶点划分到同一个第二关系图中,这样,既做到了各训练执行装置中的计算均衡,也减少了在图神经网络训练过程中需要频繁跨训练执行装置到其他训练执行装置上去读取相关邻居顶点的样本数据的过程,减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
本申请实施例在包含单GPU和多GPU上分别进行了实验。
如图7A和图7B所示为单加速器加速效果,在一个训练周期(epoch),本申请实施例提供的方案的训练速度相比于现有的深度图库(deep graph library,DGL)的训练过程,本申请(PaGraph)在数据集1至数据集6(这6个数据集可以依次是reddit、wiki-talk、livejournal、lj-link、lj-large和enwiki)上取得了1.6-4.8倍的训练效率提升。
图8为多GPU下的加速比。相比于现有的DGL,本申请方案有更高的吞吐量和训练加速比。本申请由于引入缓存的机制能够达到超线性加速比,如在数据集(en-wiki)上,在4个加速器的情况下能取得相比于单加速器4.9倍的加速比。图8中是以在一个训练集上为例进行示意的,在其他训练集上的整体趋势与图8相同,具体数值上略有差异。
二、按照出度优先的策略在训练执行装置中缓存样本数据。
训练执行装置的缓存通常都很有限,尤其是训练执行装置是GPU、FPGA或专用芯片的情况下,这时,训练执行装置通常无法存储第二关系图中每个顶点对应的样本数据。这种情况下,中心装置向训练执行装置发送第二关系图中出度满足第二预设条件的顶点对应的样本数据,出度表示一个顶点所连接的边的数量。
训练执行装置接收所述第二关系图中出度满足预设条件的顶点对应的样本数据;在本地缓存所述出度满足预设条件的顶点对应的样本数据。
可选地,在发送样本数据之前,训练执行装置可以先进行一轮测试,以确定用于存储样本数据的可用缓存空间;向中心装置发送用于指示可用缓存空间的信息,该可用缓存空间的信息用于指示中心装置发送出度满足第二预设条件的顶点对应的样本数据。
这样,在上述步骤104调用样本数据时,可以从所述本地缓存中调度所述出度满足第二预设条件的顶点对应的样本数据;从中心装置调度出度不满足所述第二预设条件的所述顶点对应的样本数据。
考虑到训练执行装置的存储空间有限,当第二关系图上的顶点数量较多时,可以优先将出度较大(满足第二预设条件)的顶点,也就是在训练过程中会被频繁使用到的顶点的 样本数据发送给训练顶点。针对出度较小(不满足第二预设条件),也就是不会被频繁使用的顶点的样本数据可以存储在中心装置上,在使用到该出度较小的顶点时,再从中心装置调用该出度较小的顶点对应的样本数据。第二预设条件可以是根据第二关系图中各顶点的出度进行排序,然后结合训练执行装置的存储空间,优先向训练执行装置发送排序在前的顶点的样本数据,直到训练执行装置的可用存储空间达到上限。该第二预设条件也可以是预先设定的一个门限值,关于该第二预设条件的设定可以有多种,本申请对此不做具体限定。本申请中第二预设条件可以与第一预设条件相同,也可以与第一预设条件不同。
该过程可以参阅图9进行理解。如图9所示,若与某个训练执行装置对应的第二训练图上包括的顶点的序号是顶点3至顶点408,按照出度优先的原则,出度满足预设条件的顶点对应的样本数据被缓存到训练执行装置的内存中,如:顶点3对应的F-3,S-3都是该顶点3对应的样本数据。同理,顶点4、顶点8、顶点102和顶点408的样本数据也都缓存到了训练执行装置的内存中。在使用图神经网络训练时,通常会按批(batch)选择顶点。如图9中,选择了顶点3、顶点5、顶点8,…,顶点102和顶点421,其中,顶点5和顶点421对应的样本数据在训练执行装置上都没有缓存,需要到中心装置上去获取,然后再进行GNN的训练。
本申请在图分区的基础上增加了训练执行装置的缓存机制,采用出度优先的缓存方式,即缓存频繁访问的顶点对应的样本数据于GPU内存中。这样,可以减少中心装置与训练执行装置因为加载各顶点的样本数据而带来的交互开销,有效的降低了图神经网络训练的耗时。
为了便于说明,图10展示了一组实验数据,该实验数据中PaGraph表示采用本申请的出度优先的缓存策略的命中率。Optimal表示通过事后分析访问行为来决定的理论上的最优缓存策略的命中率。Random表示随机缓存策略的命中率。AliGraph表示AliGraph采用的缓存策略的命中率。由图10中可见,本申请的缓存策略的命中率(Cache Hit Ration)已经几乎接近理论上的最有缓存策略的命中率。相比于随机策略和AliGraph的缓存策略有了非常明显的提升,在缓存数据(Cached Data)40%的顶点的情况下,命中率是AliGraph的两倍以上;训练性能是AliGragh的1.4倍。
本申请中的命中率指的是缓存的顶点被选中进行GNN训练的概率。
另外,由图11可以看出,在相同的缓存百分比上(Cached Percentage),在以秒(s)为单位的一个迭代周期(Epoch Time)中,本申请的缓存策略方案相比于AliGraph的缓存策略可以有效降低GNN训练时的时间开销。
以上介绍了本申请的分布式系统或并行系统,以及图神经网络训练的方法,下面结合附图介绍本申请实施例中的图神经网络训练的装置。
如图12所示,本申请实施例提供的图神经网络训练的装置30的一实施例包括:
获取单元301,用于获取用于图神经网络训练的第一关系图,第一关系图包括多个顶点和多条边,其中,每条边用于连接两个顶点,多个顶点中包括用于训练图神经网络的训练顶点。
处理单元302,用于根据获取单元301获取的第一关系图确定N个不同的第二关系图,第二关系图为第一关系图的子图,N为训练执行装置的数量,N为大于1的整数;其中,任意两个第二关系图中各自所包括的训练顶点的数量的差值小于预设阈值,且第二关系图中包 含训练顶点的邻居顶点。
发送单元303,用于向N个训练执行装置发送处理单元302确定的N个第二关系图,N个训练执行装置与N个第二关系图一一对应,N个第二关系图分别用于对应的训练执行装置训练图神经网络。
本申请实施例中,在划分第一关系图时,不仅考虑了各第二关系图中应划分的顶点的数量尽量相当,而且,尽量将训练顶点的邻居顶点划分到同一个第二关系图中,这样,既做到了各训练执行装置中的计算均衡,也减少了在图神经网络训练过程中需要频繁跨训练执行装置到其他训练执行装置上去读取相关邻居顶点的样本数据的过程,减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
可选地,处理单元302,用于根据目标顶点对应N个分区中每个分区的评估分数,将目标顶点,以及目标顶点多个邻居顶点划分到目标顶点的评估分数最高的分区中,目标顶点为第一关系图中的一个训练顶点,评估分数用于指示目标顶点与在分配目标顶点之前每个分区中已分配的顶点的相关度,其中,N个分区中每个分区对应一个训练执行装置,在第一关系图中的每个训练顶点都被分配后,每个分区中的顶点被包括在对应分区的训练执行装置的第二关系图内。
可选地,处理单元302,用于根据跳数信息获取目标顶点的多个邻居顶点,跳数信息指示从目标顶点到多个邻居顶点中每个顶点的路径中边的最大数量。
可选地,目标顶点在第一分区的评估分数与第一分区的重合数正相关,第一分区的重合数用于指示多个邻居顶点和第一分区中已分配的顶点重合的数量,第一分区为N个分区中的任意一个。
可选地,第一分区的评估分数为第一分区的重合数与第一分区的均衡比的乘积,均衡比用于指示目标顶点划分到第一分区的概率,均衡比为第一差值与第一分区加入多个邻居顶点后的顶点数量的比值,第一差值为预先配置的第一分区的顶点数量上限值与第一分区中已分配的顶点数量的差值。
可选地,N个第二关系图中的顶点的出度满足第一预设条件,出度表示一个顶点所连接的边的数量。
可选地,发送单元303,还用于向训练执行装置发送第二关系图中出度满足第二预设条件的顶点对应的样本数据,出度表示一个顶点所连接的边的数量。
可选地,获取单元301,还用于接收训练执行装置发送的用于指示可用缓存空间的信息。
处理单元302,用于根据用于指示可用缓存空间的信息,确定出度满足第二预设条件的顶点。
以上所描述的图神经网络训练的装置30可以参阅前述方法实施例部分的相应描述进行理解,此处不做重复赘述。
图13为本申请实施例提供的图神经网络训练的装置的一实施例示意图。
如图13所示,本申请实施例提供的图神经网络训练的装置40的一实施例包括:
接收单元401,用于接收从第一关系图中得到的第二关系图的信息,第一关系图包括多个顶点和多条边,其中,每条边用于连接具有直接关联关系的两个顶点,多个顶点中包 括用于训练图神经网络的训练顶点,第二关系图中包含与训练顶点具有目标关联关系的邻居顶点。
第一处理单元402,用于根据接收单元401接收的第二关系图的信息,调用第二关系图中的顶点对应的样本数据;
第二处理单元403,用于根据第一处理单元402调用的样本数据图神经网络训练。
本申请实施例中,第二关系图中的顶点都是具有目标关联关系的顶点,在图神经网络训练时不需要频繁跨训练执行装置到其他训练执行装置上去读取相关邻居顶点的样本数据的过程,减少了跨训练执行装置的网络开销,提高了图神经网络的训练效率。
可选的,接收单元401,还用于接收第二关系图中出度满足第二预设条件的顶点对应的样本数据。
存储单元404,用于在本地缓存出度满足第二预设条件的顶点对应的样本数据。
第一处理单元402,用于从本地缓存中调度出度满足第二预设条件的顶点对应的样本数据;从中心装置调度出度不满足第二预设条件的顶点对应的样本数据。
可选的,第二处理单元403,还用于对图神经网络进行一轮测试,以确定用于存储样本数据的可用缓存空间。
该装置40还可以包括发送单元,该发送单元用于向中心装置发送用于指示可用缓存空间的信息,可用缓存空间的信息用于指示中心装置发送出度满足第二预设条件的顶点对应的样本数据。
以上所描述的图神经网络训练的装置40可以参阅前述方法实施例部分的相应描述进行理解,此处不做重复赘述。
图14所示,为本申请的实施例提供的计算机设备50的一种可能的逻辑结构示意图。该计算机设备50可以是中心装置,也可以是训练执行装置。也可以是包括中心装置和训练执行装置的分布式系统。该计算机设备50包括:处理器501、通信接口502、存储器503以及总线504。处理器501、通信接口502以及存储器503通过总线504相互连接。在本申请的实施例中,处理器501用于对计算机设备50的动作进行控制管理,例如,处理器501用于执行图3方法实施例中的步骤101、102,以及104和105,通信接口502用于支持计算机设备50进行通信。存储器503,用于存储计算机设备50的程序代码和数据。
其中,处理器501中可以包括中央处理器单元(CPU)、图形处理器(GPU),该处理器501还可以是通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器501也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线504可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中 存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述图3至图11的图神经网络训练的方法。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;当设备的处理器执行该计算机执行指令时,设备执行上述图3至图11的图神经网络训练的方法。
在本申请的另一实施例中,还提供一种芯片系统,该芯片系统包括处理器,该处理器用于实现上述图3至图11的图神经网络训练的方法。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存进程间通信的装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请实施例各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
Claims (14)
- 一种图神经网络训练的方法,其特征在于,包括:获取用于图神经网络训练的第一关系图,所述第一关系图包括多个顶点和多条边,其中,每条边用于连接两个顶点,所述多个顶点中包括用于训练所述图神经网络的训练顶点;根据所述第一关系图确定N个不同的第二关系图,所述第二关系图为所述第一关系图的子图,所述N为训练执行装置的数量,所述N为大于1的整数;其中,任意两个第二关系图中各自所包括的所述训练顶点的数量的差值小于预设阈值,且所述第二关系图中包含所述训练顶点的邻居顶点;向所述N个训练执行装置发送N个第二关系图的信息,所述N个训练执行装置与所述N个第二关系图一一对应,所述N个第二关系图分别用于对应的训练执行装置训练所述图神经网络。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一关系图确定N个不同的第二关系图,包括:根据目标顶点对应N个分区中每个分区的评估分数,将所述目标顶点,以及所述目标顶点的多个邻居顶点划分到所述目标顶点的评估分数最高的分区中,所述目标顶点为所述第一关系图中的一个训练顶点,所述评估分数用于指示所述目标顶点与在分配所述目标顶点之前所述每个分区中已分配的顶点的相关度,其中,所述N个分区中每个分区对应一个训练执行装置,在所述第一关系图中的每个训练顶点都被分配后,所述每个分区中的顶点被包括在对应该分区的训练执行装置的第二关系图内。
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:根据跳数信息获取所述目标顶点的所述多个邻居顶点,所述跳数信息指示从所述目标顶点到所述多个邻居顶点中每个顶点的路径中边的最大数量。
- 根据权利要求2或3所述的方法,其特征在于,所述目标顶点在第一分区的评估分数与所述第一分区的重合数正相关,所述第一分区的重合数用于指示所述多个邻居顶点和所述第一分区中已分配的顶点重合的数量,所述第一分区为所述N个分区中的任意一个。
- 根据权利要求4所述的方法,其特征在于,所述第一分区的评估分数为所述第一分区的重合数与所述第一分区的均衡比的乘积,所述均衡比用于指示所述目标顶点划分到所述第一分区的概率,所述均衡比为第一差值与所述第一分区加入多个邻居顶点后的顶点数量的比值,所述第一差值为预先配置的所述第一分区的顶点数量上限值与所述第一分区中已分配的顶点数量的差值。
- 根据权利要求2-5任一项所述的方法,其特征在于,所述N个第二关系图中的顶点的出度满足第一预设条件,所述出度表示一个顶点所连接的边的数量。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:向所述训练执行装置发送所述第二关系图中出度满足第二预设条件的顶点对应的样本数据,所述出度表示一个顶点所连接的边的数量。
- 根据权利要求7所述的方法,其特征在于,所述方法还包括:接收所述训练执行装置发送的用于指示可用缓存空间的信息;根据所述用于指示可用缓存空间的信息,确定出度满足所述第二预设条件的顶点。
- 一种图神经网络训练的装置,其特征在于,包括:获取单元,用于获取用于图神经网络训练的第一关系图,所述第一关系图包括多个顶点和多条边,其中,每条边用于连接两个顶点,所述多个顶点中包括用于训练所述图神经网络的训练顶点;处理单元,用于根据所述获取单元获取的第一关系图确定N个不同的第二关系图,所述第二关系图为所述第一关系图的子图,所述N为训练执行装置的数量,所述N为大于1的整数;其中,任意两个第二关系图中各自所包括的所述训练顶点的数量的差值小于预设阈值,且所述第二关系图中包含所述训练顶点的邻居顶点;发送单元,用于向所述N个训练执行装置发送处理单元确定的N个第二关系图,所述N个训练执行装置与所述N个第二关系图一一对应,所述N个第二关系图分别用于对应的训练执行装置训练所述图神经网络。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-8任一项所述的方法。
- 一种计算设备,其特征在于,包括处理器和存储有计算机程序的计算机可读存储介质;所述处理器与所述计算机可读存储介质耦合,所述计算机程序被所述处理器执行时实现如权利要求1-8任一项所述的方法。
- 一种芯片系统,其特征在于,包括处理器,所述处理器被调用用于执行如权利要求1-8任一项所述的方法。
- 一种图神经网络训练系统,其特征在于,包括:中心装置和多个训练执行装置;所述中心装置用于执行如权利要求1-8任一项所述的方法;所述多个训练执行装置中的每个训练执行装置用于训练图神经网络。
- 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序当被一个或多个处理器执行时用于实现如权利要求1-8任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010970736.7 | 2020-09-15 | ||
CN202010970736.7A CN114266281A (zh) | 2020-09-15 | 2020-09-15 | 一种图神经网络训练的方法、装置及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022057310A1 true WO2022057310A1 (zh) | 2022-03-24 |
Family
ID=80777613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/096588 WO2022057310A1 (zh) | 2020-09-15 | 2021-05-28 | 一种图神经网络训练的方法、装置及系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114266281A (zh) |
WO (1) | WO2022057310A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114817411A (zh) * | 2022-06-23 | 2022-07-29 | 支付宝(杭州)信息技术有限公司 | 分布式图学习方法和装置 |
CN115270686A (zh) * | 2022-06-24 | 2022-11-01 | 无锡芯光互连技术研究院有限公司 | 一种基于图神经网络的芯片布局方法 |
CN117290560A (zh) * | 2023-11-23 | 2023-12-26 | 支付宝(杭州)信息技术有限公司 | 图计算任务中获取图数据的方法和装置 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114818483B (zh) * | 2022-04-14 | 2023-04-07 | 东南大学溧阳研究院 | 一种基于图神经网络的机电扰动定位及传播预测方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705709A (zh) * | 2019-10-14 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | 训练图神经网络模型的方法和装置 |
US20200151563A1 (en) * | 2018-11-08 | 2020-05-14 | Nec Laboratories America, Inc. | Method for supervised graph sparsification |
CN111581983A (zh) * | 2020-05-13 | 2020-08-25 | 中国人民解放军国防科技大学 | 基于群体分析的网络舆论事件中社会关注热点的预测方法 |
CN111652346A (zh) * | 2020-04-21 | 2020-09-11 | 厦门渊亭信息科技有限公司 | 一种基于分层优化范式的大规模图深度学习计算框架 |
-
2020
- 2020-09-15 CN CN202010970736.7A patent/CN114266281A/zh active Pending
-
2021
- 2021-05-28 WO PCT/CN2021/096588 patent/WO2022057310A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200151563A1 (en) * | 2018-11-08 | 2020-05-14 | Nec Laboratories America, Inc. | Method for supervised graph sparsification |
CN110705709A (zh) * | 2019-10-14 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | 训练图神经网络模型的方法和装置 |
CN111652346A (zh) * | 2020-04-21 | 2020-09-11 | 厦门渊亭信息科技有限公司 | 一种基于分层优化范式的大规模图深度学习计算框架 |
CN111581983A (zh) * | 2020-05-13 | 2020-08-25 | 中国人民解放军国防科技大学 | 基于群体分析的网络舆论事件中社会关注热点的预测方法 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114817411A (zh) * | 2022-06-23 | 2022-07-29 | 支付宝(杭州)信息技术有限公司 | 分布式图学习方法和装置 |
CN115270686A (zh) * | 2022-06-24 | 2022-11-01 | 无锡芯光互连技术研究院有限公司 | 一种基于图神经网络的芯片布局方法 |
CN117290560A (zh) * | 2023-11-23 | 2023-12-26 | 支付宝(杭州)信息技术有限公司 | 图计算任务中获取图数据的方法和装置 |
CN117290560B (zh) * | 2023-11-23 | 2024-02-23 | 支付宝(杭州)信息技术有限公司 | 图计算任务中获取图数据的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN114266281A (zh) | 2022-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022057310A1 (zh) | 一种图神经网络训练的方法、装置及系统 | |
US20220391771A1 (en) | Method, apparatus, and computer device and storage medium for distributed training of machine learning model | |
US20230334294A1 (en) | Multi-memory on-chip computational network | |
US11775430B1 (en) | Memory access for multiple circuit components | |
EP3736692A1 (en) | Using computational cost and instantaneous load analysis for intelligent deployment of neural networks on multiple hardware executors | |
US11294599B1 (en) | Registers for restricted memory | |
WO2018099084A1 (zh) | 一种神经网络模型训练方法、装置、芯片和系统 | |
CN112352234A (zh) | 用于处理并发属性图查询的系统 | |
US20240054384A1 (en) | Operation-based partitioning of a parallelizable machine learning model network on accelerator hardware | |
US20210295168A1 (en) | Gradient compression for distributed training | |
CN113469355B (zh) | 分布式系统中的多模型训练管道 | |
CN113821332B (zh) | 自动机器学习系统效能调优方法、装置、设备及介质 | |
WO2019118363A1 (en) | On-chip computational network | |
US12093801B1 (en) | Neural network processing based on subgraph recognition | |
CN112764893B (zh) | 数据处理方法和数据处理系统 | |
US11941528B2 (en) | Neural network training in a distributed system | |
WO2024045989A1 (zh) | 图网络数据集的处理方法、装置、电子设备、程序及介质 | |
CN113449861A (zh) | 使用部分梯度更新的推测性训练 | |
CN111190735A (zh) | 一种基于Linux的片上CPU/GPU流水化计算方法及计算机系统 | |
Cassell et al. | EGTAOnline: An experiment manager for simulation-based game studies | |
JP7492555B2 (ja) | 複数の入力データセットのための処理 | |
CN112990895A (zh) | 一种加速区块链交易并行执行的方法、设备及储存介质 | |
CN116910568B (zh) | 图神经网络模型的训练方法及装置、存储介质及电子装置 | |
CN117216382A (zh) | 一种交互处理的方法、模型训练的方法以及相关装置 | |
WO2020019315A1 (zh) | 一种基于图数据的计算运行调度方法、系统、计算机可读介质及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21868147 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21868147 Country of ref document: EP Kind code of ref document: A1 |