CN112767230A  GPU graph neural network optimization method and device  Google Patents
GPU graph neural network optimization method and device Download PDFInfo
 Publication number
 CN112767230A CN112767230A CN202110222831.3A CN202110222831A CN112767230A CN 112767230 A CN112767230 A CN 112767230A CN 202110222831 A CN202110222831 A CN 202110222831A CN 112767230 A CN112767230 A CN 112767230A
 Authority
 CN
 China
 Prior art keywords
 graph
 gpu
 visual range
 data
 calculation
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
Images
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
 G06T1/00—General purpose image data processing
 G06T1/20—Processor architectures; Processor configuration, e.g. pipelining

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F9/00—Arrangements for program control, e.g. control units
 G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
 G06F9/46—Multiprogramming arrangements
 G06F9/54—Interprogram communication
 G06F9/544—Buffers; Shared memory; Pipes

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/04—Architecture, e.g. interconnection topology

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/08—Learning methods
Abstract
A GPU graph neural network optimization method and a computer readable medium are provided, the optimization method comprises the following steps: generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition; obtaining a plurality of equivalent calculation graphs aiming at the calculation graphs; comparing the calculated amount of each calculation graph, and selecting the calculation graph with the minimum calculated amount; and generating corresponding GPU codes for the selected computation graph. The visual range of data input and the visual range of data output required by each operation of the selected calculation graph can be obtained through analysis; the problem that the data visual range of the operation with the dependency relationship is not matched is solved; and merging the data visual matched operations into the same GPUkernel. The graph neural network optimization method can reduce mismatching of data visual ranges among operators in the graph neural network, so that the operators are combined to reduce memory access; meanwhile, an equivalent calculation graph without redundancy calculation can be found, and the redundant calculation is reduced.
Description
Technical Field
The present invention relates generally to graph neural networks, and more particularly, to a graph neural network optimization method and a graphics processing unit.
Background
In the past few years, Graph Neural Networks (GNNs) have become a major method of machine learning on graph data. It has the most advanced performance in prediction tasks on various graphs (e.g., node classification, graph classification, and link prediction). Also, GNN outperforms other methods in representing tasks (e.g., DeepWalk, Node2Vec) to provide a better representation of nodes for downstream tasks.
Thus, GNN accounts for over 90% of the leading performance in Graphrelated tasks in Open Graph Benchmark (OGB) and CogDL. The scope of influence of GNNs encompasses many areas, from biology to medicine, social networking, personal recommendation, knowledge map processing, and the like. GNN is a machine learning method that combines graphrelated operations with neural network operations, which encodes relationships between entities into node features through graph operations, transforms node features through neural network operations, and updates training parameters.
Disclosure of Invention
The performance (speed) of GNNs is critical for many of their applications. As researchers find deeper GNN models with better accuracy, and begin to study models on larger map data to better simulate realworld problems. However, the computational efficiency and manner of existing graph neural network systems limits the application and study of GNNs, resulting in some researchers sacrificing model accuracy in exchange for faster performance.
Although GNNs are primarily composed of graphrelated operations and Neural network operations, simply putting together an graph computation framework and a dnn (deep Neural network) framework is not sufficient to efficiently support GNN execution because it cannot efficiently handle the complex interactions between graphrelated operations and Neural network operations, which is critical to GNN performance.
For example, graphdependent operations and neural network operations are staggered in GNN computational graphs, and complex data dependencies between them complicate kernel fusion, memory performance optimization, task scheduling, and load balancing.
Some GNN models perform neural operations even in graph structures, and it is difficult for existing frameworks to achieve high efficiency because the dense computational nature of the neural operations does not match the sparse patterns in the graph operations.
Existing GNN systems include PytrichGeometric (PyG) (see reference [1]), Deep Graph Library (DGL) (see reference [2]) and NeuGraph. However, none of these utilize and solve the combination between graphrelated operations and neural networkrelated operations in GNNs, and therefore, there is a significant amount of overhead in their implementation, such as redundant computation and memory access. The inventors of the present invention found that on the Nvidia Tesla V100 GPU these frameworks achieved peak throughput of less than 10%, while on some more complex GNN models there was about 30% of the time for redundant memory access and computation.
The related documents are:
[1].Matthias Fey,and Jan Eric Lenssen."Fast Graph Representation Learning with PyTorch Geometric."CoRR abs/1903.02428(2019).
[2].Minjie Wang,et al."Deep Graph Library:Towards Efficient and Scalable Deep Learning on Graphs."CoRR abs/1909.01315(2019).
the present invention has been made in view of the above circumstances.
According to one aspect of the invention, a GPU graph neural network optimization method is provided, which includes: generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition; obtaining a plurality of equivalent calculation graphs aiming at the calculation graphs; comparing the calculated amount of each calculation graph, and selecting the calculation graph with the minimum calculated amount; and generating corresponding GPU codes for the selected computation graph.
Optionally, for the computation graph, obtaining a plurality of equivalent computation graphs includes: obtaining information of tensor and types of operation on the calculation graph; and (3) applying a commutative law according to the type of the operation and the information of the input tensor, and changing the calculation sequence and the mode of the calculation graph by combining the law and the distribution law.
Optionally, the information of the tensor comprises a shape and a position.
Optionally, comparing the calculation amounts of the respective calculation graphs, and selecting the calculation graph with the smallest calculation amount includes: and for each calculation graph, obtaining the number of floating point operations required by each calculation, and selecting the calculation graph with the minimum number of the required floating point operations.
Optionally, the GPU graph neural network optimization method may further include, after selecting the computation graph with the smallest computation amount: analyzing the selected calculation diagram to obtain a data visual range required to be input by each operation and an output data visual range; the problem that the data visual range of the operation with the dependency relationship is not matched is solved; and merging the data visual matched operations into the same GPUkernel.
Optionally, the solving of the problem that the data visual range of the dependent operation does not match includes: the data visual range adapter is adopted to solve the problem of data visual range mismatch between interdependent operations, wherein the data visual range adapter can be operated as a code segment, detects the data visual range of the operations before and after, and obtains the minimum data visual range required by the operations, and then inserts the code segment, and the code segment is operated to enable the data to be shared by using interthread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the operations before and after into a kernel function.
Optionally, if the required minimum data visual range is shared in the Warp, the data visual range adapter performs sharing operation on the thread private data by using a GPU Warp Shuffle primitive; if the required minimum data visual range is shared in the thread block, the data visual range adapter uses the shared memory of the GPU, and the thread private data is stored in the GPU shared memory to realize the data sharing in the thread block.
According to another aspect of the present invention, there is provided a GPU graph neural network optimization method, including: generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition; aiming at the calculation graph, analyzing to obtain a data visual range required to be input by each operation and an output data visual range; the problem that the data visual range of the operation with the dependency relationship is not matched is solved; merging the visually matched operations of the data into the same GPU kernel function; and generating corresponding GPU codes for the modified calculation graph.
Optionally, the solving of the problem that the data visual range of the dependent operation does not match includes: the data visual range adapter is used for solving the problem of data visual range mismatch between interdependent operations, wherein the data visual range adapter is a code segment, detects the data visual range of the front and back operations, and obtains the minimum data visual range required by the operations, and then inserts the code segment, and the code segment is operated to enable the data to be shared by using interthread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the front and back operations into a kernel function.
According to another aspect of the present invention, there is provided a graphics processing unit GPU comprising a processor and a memory having stored thereon computer executable code operable, when executed by the processor, to perform the aforementioned GPU graph neural network optimization method.
According to another aspect of the present invention, there is provided a computer readable medium having stored thereon computer executable code, which when executed by a processor is operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
With the GPU graph neural network optimization technique according to embodiments of the present invention, one or more of the following advantages may be achieved:
(1) by detecting the data visual range among the operations in the graph neural network, for the mismatching of the data visual range, the mismatching problem is solved through the adapter in a finegrained manner, so that a plurality of operations can be carried out in the same GPU function, and the calling times of the GPU kernel function and the access amount of the global memory are reduced.
(2) Meanwhile, for the calculation graph in the graph neural network, the invention can recursively apply an equivalent linear transformation method to generate a large number of equivalent calculation graphs, and the invention can automatically find the version of the calculation graph in which redundant calculation is avoided and apply the version to the training process.
Drawings
FIG. 1 shows a general flow diagram 100 of a GPU graph neural network optimization method according to an embodiment of the invention.
FIG. 2 shows an overall flowchart 200 of a GPU graph neural network optimization method according to another embodiment of the invention.
Fig. 3 is a schematic workflow diagram of a GPU graph neural network optimization method according to an embodiment of the present invention.
Fig. 4 shows an example of a computation graph according to an embodiment of the invention.
FIG. 5 illustrates one example of a data visibility range adapter in accordance with one embodiment of the present invention.
FIG. 6 illustrates an example of redundant computation elimination according to an embodiment of the present invention.
Detailed Description
Various embodiments of the present invention are described below with reference to the accompanying drawings.
FIG. 1 shows a general flow diagram 100 of a GPU graph neural network optimization method according to an embodiment of the invention.
As shown in fig. 1, in step S110, a computational graph including tensors and operations is generated for the GPU graph neural network model definition.
In step S120, a plurality of equivalent computation graphs are obtained for the computation graph.
In step S130, the calculation amounts of the respective calculation maps are compared, and the calculation map with the smallest calculation amount is selected.
In step S140, for the selected computation graph, a corresponding GPU code is generated.
The GPU graph neural network optimization method according to the embodiment of the present invention described in conjunction with fig. 1 exploits the combination of graphrelated operations and neural networkrelated operations in a graph neural network, automatically detecting and reducing redundant computations by transforming the computational graph with linear properties.
FIG. 2 shows an overall flowchart 200 of a GPU graph neural network optimization method according to another embodiment of the invention.
In step S210, a computational graph including tensors and operations is generated for the GPU graph neural network model definition.
In step S220, the data visual range required to be input and the data visual range required to be output for each operation of the computation graph are analyzed and obtained.
In step S230, the problem that the data visibility ranges of the dependent operations do not match is solved.
In step S240, the visually matched operations of the data are merged into the same GPU kernel.
In step S250, for the modified computation graph, a corresponding GPU code is generated.
In the embodiment of the invention, the mismatching problem is solved in a finegrained manner through the adapter for mismatching of the data visual range by detecting the data visual range among the operations in the graph neural network, so that a plurality of operations can be carried out in the same GPU function, and the calling times of the GPU kernel function and the access amount of the global memory are reduced.
Fig. 3 is a schematic workflow diagram of a GPU graph neural network optimization method according to an embodiment of the present invention, which combines the GPU graph neural network optimization methods shown in fig. 1 and fig. 2.
Firstly, a user inputs a code describing graph neural network calculation (as shown in 1 in fig. 3), and a classical analysis technology in a compiling technology is adopted to generate a calculation graph containing tensors and operations by analyzing the user code (as shown in 2 in fig. 3 and fig. 4, wherein each circle of a gray background represents a tensor, each circle of a white background represents an operation, and a connection relation between the circles represents a dependency relation between the operation and the tensor). After the computation graph is obtained, a plurality of equivalent candidate computation graphs (as shown in 3 in fig. 3) can be obtained through the computation graph conversion step in the "redundant computation elimination". The computation graph with the least number of floatingpoint calculations, i.e. no computational redundancy, is selected using floatingpoint operand analysis for these equivalent candidate computation graphs (as shown in 4 in fig. 3). On a computational graph without computational redundancy, the mismatch of the data visual range between operations can be solved through a data visual range adapter, so that the merging of kernel functions is completed (as shown in 5 in fig. 3, data at different stages are shared through the adapter, and the matching of the data visual range is completed). Finally, by means of code generation and compiling, an efficient executable program is obtained, and calculation of the neural network of the graph is performed. The following is a detailed operation of two steps: redundant computation elimination and data visibility range adapter.
One, redundant computation elimination
The redundant computation elimination method of the embodiment generates a computation graph formed by the computation operations by adopting an analysis technology according to codes which are input by a user and describe the computation contents of the graph neural network.
After generating the computation graph, the method may derive properties of the respective tensors and operations, such as shape, location, and type of operation. The operation types are as follows: elementbyelement operations, specification operations, activation function operations, and other operations. The tensor positions are: edges, points (origin, destination) and global. According to the preset rule, whether the operation type and the tensor position respectively accord with the association law, the distribution law and the exchange law can be determined according to the operation type and the tensor position. And for the operation with corresponding property, performing related transformation (such as exchanging calculation sequence and the like) and generating a new calculation graph. On the new computation graph, the search and application of these transformations are continued, and a new equivalent computation graph continues to be generated.
In all the newly generated equivalent calculation graphs, according to the shape, the position and the type of the operation of the tensor, the number of floating point operations required by each operation can be obtained, so that the total number of floating point operands required by the whole calculation graph is obtained, and the calculation graph with the minimum number of required floating point operations is selected.
As shown in fig. 6, in the gray background box of the computation graph (left side of fig. 6) before changing, the feature vector (src.heat) of the source node and the feature vector (dst.heat) of the destination node are added to obtain the sum of the feature vectors (src.heat + dst.heat), and then matrix multiplication is applied to the sum of the feature vectors to perform conversion ((src.heat + dst.heat) ((src.weight)). After applying redundant computation to the original computation graph to eliminate, a transformed computation graph (right side in fig. 6) is obtained, which first performs conversion of the feature vector (src.flat) of the source node and the feature vector (dst.flat) of the destination node, and then sums the converted feature vector (src.flat _ weight) of the source node and the converted feature vector (dst.flat _ weight) of the destination node to obtain (src.flat _ weight + dst.flat _ weight). The changed calculation diagram is calculated from edge to point due to the matrix multiplication operation, and the number of edges in the diagram data is far larger than the number of points, so that the calculation amount is greatly reduced.
Second, data visual range adapter
On the basis of the obtained final calculation graph, for each operation, according to the operation type and the shape of the input and output tensors, the required input data visual range and the output data visual range are obtained. The data visual range refers to that data is shared by threads in which ranges, and there are four data visual ranges, namely thread private, sharing in Warp, sharing in thread blocks and global sharing. When the data visibility ranges of the two operations are not matched, the two operations need to be executed in different kernel functions, and reading and writing of the global memory are generated, so that redundant memory access overhead is caused.
In the embodiment of the invention, after the dependency analysis is carried out on the calculation graph, if the data visual range mismatch exists between the operations with the dependency relationship, the data visual range mismatch between the operations with the mutual dependency is solved by adopting the data visual range adapter.
The data visibility range adapter may be, for example, a piece of code that, when executed, operates to detect the data visibility range of a previous or subsequent operation and to derive the minimum data visibility range required for the operation, and then changes the data visibility range by sharing the data using a different method (interthread communication or shared memory) so that the data visibility range satisfies the requirement of merging the previous or subsequent operations into one kernel function. If the required minimum data visual range is shared in the Warp, the adapter adopts GPU Warp Shuffle primitive to share the thread private data; if the required minimum data visual range is shared in the thread block, the adapter uses the shared memory of the GPU, and data sharing in the thread block is realized by storing thread private data into the shared memory of the GPU.
As shown on the left side of fig. 5, the data visibility required for an operation is at the thread block level, but existing system DGLs and pygs ignore the properties of the data visibility of an operation, and even if data only needs to be shared within a thread block, the DGLs and pygs default to placing the operation in separate kernels for global sharing. The data visibility range adapter proposed by the method can determine the required minimum visibility range of the operation, utilize the shared memory to extend the visibility range to the sharing in the thread block, and share the data from the thread private to the inside of the thread block, as shown in the right side of fig. 5, the thread block of the next operation can continue to calculate without waiting for other blocks or accessing the global memory. Therefore, the two kernel functions can be combined together by using the method, and redundant memory access is reduced.
FIG. 3 illustrates a graph neural network optimization method for eliminating redundant memory accesses and computational overhead according to an embodiment of the present invention, which can reduce the mismatch of data visibility ranges between operators in the graph neural network, thereby merging operators to reduce memory accesses; meanwhile, an equivalent calculation graph without redundancy calculation can be found, and the redundant calculation is reduced.
According to another embodiment of the present invention, there is provided a graphics processing unit GPU comprising a processor and a memory having stored thereon computer executable code operable when executed by the processor to perform the aforementioned GPU graph neural network optimization method.
According to another embodiment of the present invention, there is provided a computerreadable medium having stored thereon computerexecutable code, which when executed by a processor, is operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
The invention can be implemented in various forms of software, hardware or a combination thereof, and can be distributed or centralized.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A GPU graph neural network optimization method comprises the following steps:
generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition;
obtaining a plurality of equivalent calculation graphs aiming at the calculation graphs;
comparing the calculated amount of each calculation graph, and selecting the calculation graph with the minimum calculated amount;
and generating corresponding GPU codes for the selected computation graph.
2. The GPU graph neural network optimization method of claim 1, for the computational graph, deriving a plurality of equivalent computational graphs comprising:
obtaining information of tensor and types of operation on the calculation graph;
and (3) applying a commutative law according to the type of the operation and the information of the input tensor, and changing the calculation sequence and the mode of the calculation graph by combining the law and the distribution law.
3. The GPU graph neural network optimization method of claim 1, comparing the computation amounts of the respective computation graphs, and selecting the computation graph with the smallest computation amount comprises:
and for each calculation graph, obtaining the number of floating point operations required by each calculation, and selecting the calculation graph with the minimum number of the required floating point operations.
4. The GPU graph neural network optimization method of claim 1, further comprising, after selecting the computation graph with the smallest computation amount:
analyzing the selected calculation diagram to obtain a data visual range required to be input by each operation and an output data visual range;
the problem that the data visual range of the operation with the dependency relationship is not matched is solved;
and merging the data visual matched operations into the same GPUkernel.
5. The GPU graph neural network optimization method of claim 4, wherein the solving of the problem of data visibility range mismatch of dependent operations comprises:
the data visual range adapter is adopted to solve the problem of data visual range mismatch between operations which are mutually dependent, wherein the data visual range adapter is operable to detect the data visual range of front and back operations and obtain the minimum data visual range required by the operations, and then a code segment is inserted, and the code segment is operated to enable data to be shared by using interthread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the front and back operations into a kernel function.
6. The GPU graph neural network optimization method of claim 5, wherein if the required minimum data visual range is shared in Warp, the data visual range adapter performs sharing operation on thread private data by adopting GPU Warp Shuffle primitives; if the required minimum data visual range is shared in the thread block, the data visual range adapter uses the shared memory of the GPU, and the thread private data is stored in the GPU shared memory to realize the data sharing in the thread block.
7. A GPU graph neural network optimization method comprises the following steps:
generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition;
aiming at the calculation graph, analyzing to obtain a data visual range required to be input by each operation and an output data visual range;
the problem that the data visual range of the operation with the dependency relationship is not matched is solved;
merging the visually matched operations of the data into the same GPU kernel function;
and generating corresponding GPU codes for the modified calculation graph.
8. The GPU graph neural network optimization method of claim 7, the solving the problem of data visibility range mismatch for dependent operations comprising:
the data visual range adapter is used for solving the problem of data visual range mismatch between interdependent operations, wherein the data visual range adapter is a code segment, detects the data visual range of the front and back operations, and obtains the minimum data visual range required by the operations, and then inserts the code segment, and the code segment is operated to enable the data to be shared by using interthread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the front and back operations into a kernel function.
9. A GPU graph neural network optimization device, comprising a processor and a memory, the memory having computerexecutable code stored thereon, the code, when executed by the processor, being operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
10. A computer readable medium having computer executable code stored thereon, the code, when executed by a processor, being operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202110222831.3A CN112767230A (en)  20210226  20210226  GPU graph neural network optimization method and device 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202110222831.3A CN112767230A (en)  20210226  20210226  GPU graph neural network optimization method and device 
Publications (1)
Publication Number  Publication Date 

CN112767230A true CN112767230A (en)  20210507 
Family
ID=75704331
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202110222831.3A Pending CN112767230A (en)  20210226  20210226  GPU graph neural network optimization method and device 
Country Status (1)
Country  Link 

CN (1)  CN112767230A (en) 
Cited By (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN114237918A (en) *  20220228  20220325  之江实验室  Graph execution method and device for neural network model calculation 
CN115268877A (en) *  20220927  20221101  之江实验室  Intermediate representation method and device for parallel execution of graph computation 
WO2023071149A1 (en) *  20211027  20230504  上海商汤智能科技有限公司  Video memory optimization method and apparatus, device, storage medium and program product 
US11782723B1 (en)  20220927  20231010  Zhejiang Lab  Intermediate representation method and apparatus for parallel execution of graph computation 
Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US20160284095A1 (en) *  20150327  20160929  Edmond Chalom  Machine learning of realtime image capture parameters 
US10074206B1 (en) *  20170523  20180911  Amazon Technologies, Inc.  Networkoptimized graphics library for virtualized graphics processing 
US20190311214A1 (en) *  20180405  20191010  Imagination Technologies Limited  Matching Local Image Feature Descriptors in Image Analysis 
CN111338635A (en) *  20200220  20200626  腾讯科技（深圳）有限公司  Graph compiling method, device and equipment for calculation graph and storage medium 
CN111401538A (en) *  20190924  20200710  上海寒武纪信息科技有限公司  Data processing method and device, computer equipment and storage medium 

2021
 20210226 CN CN202110222831.3A patent/CN112767230A/en active Pending
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US20160284095A1 (en) *  20150327  20160929  Edmond Chalom  Machine learning of realtime image capture parameters 
US10074206B1 (en) *  20170523  20180911  Amazon Technologies, Inc.  Networkoptimized graphics library for virtualized graphics processing 
US20190311214A1 (en) *  20180405  20191010  Imagination Technologies Limited  Matching Local Image Feature Descriptors in Image Analysis 
CN111401538A (en) *  20190924  20200710  上海寒武纪信息科技有限公司  Data processing method and device, computer equipment and storage medium 
CN111338635A (en) *  20200220  20200626  腾讯科技（深圳）有限公司  Graph compiling method, device and equipment for calculation graph and storage medium 
NonPatent Citations (1)
Title 

KEZHAO HUANG 等: "Understanding and Bridging the Gaps in Current GNN Performance Optimizations", 《PPOPP "21: PROCEEDINGS OF THE 26TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING》 * 
Cited By (5)
Publication number  Priority date  Publication date  Assignee  Title 

WO2023071149A1 (en) *  20211027  20230504  上海商汤智能科技有限公司  Video memory optimization method and apparatus, device, storage medium and program product 
CN114237918A (en) *  20220228  20220325  之江实验室  Graph execution method and device for neural network model calculation 
US11941514B2 (en)  20220228  20240326  Zhejiang Lab  Method for execution of computational graph in neural network model and apparatus thereof 
CN115268877A (en) *  20220927  20221101  之江实验室  Intermediate representation method and device for parallel execution of graph computation 
US11782723B1 (en)  20220927  20231010  Zhejiang Lab  Intermediate representation method and apparatus for parallel execution of graph computation 
Similar Documents
Publication  Publication Date  Title 

US11809993B2 (en)  Systems and methods for determining graph similarity  
CN112767230A (en)  GPU graph neural network optimization method and device  
CN113449857B (en)  Data processing method and data processing equipment  
JP6954049B2 (en)  Methods and equipment to complete the knowledge graph  
US11507846B2 (en)  Representing a neural network utilizing paths within the network to improve a performance of the neural network  
Liu et al.  Pudiannao: A polyvalent machine learning accelerator  
US20200394459A1 (en)  Cell image synthesis using one or more neural networks  
Wang et al.  Clustering aggregation by probability accumulation  
CN110383247A (en)  Method, computerreadable medium and heterogeneous computing system performed by computer  
US20210209270A1 (en)  Distributed tensor network contraction scheme with splitting based on dynamic ordering  
Gerber et al.  Data analysis with the morsesmale complex: The msr package for r  
Ming et al.  COINSTAC: Decentralizing the future of brain imaging analysis  
CN112836787A (en)  Reducing deep neural network training times through efficient hybrid parallelization  
Shao et al.  Deep multicenter learning for face alignment  
Chai et al.  A modelagnostic approach to mitigate gradient interference for multitask learning  
CN115018065A (en)  Artificial neural networks generated from lowdifference sequences  
Acebes et al.  A cartesian grid representation of left atrial appendages for a deep learning estimation of thrombogenic risk predictors  
Zhao et al.  APUNet: Attentionguided upsampling network for sparse and nonuniform point cloud  
Zohrehbandian  Using Zionts–Wallenius method to improve estimate of value efficiency in DEA  
US20220129755A1 (en)  Incorporating a ternary matrix into a neural network  
CN115860061A (en)  Graph neural network optimization method and graph neural network inference system  
US20220343146A1 (en)  Method and system for temporal graph neural network acceleration  
Tran et al.  A distributed data mining framework accelerated with graphics processing units  
CN111723247A (en)  Graphbased hypothetical computation  
Cai et al.  The multitask learning with an application of Pareto improvement 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
WD01  Invention patent application deemed withdrawn after publication 
Application publication date: 20210507 

WD01  Invention patent application deemed withdrawn after publication 