CN112767230A - GPU graph neural network optimization method and device - Google Patents

GPU graph neural network optimization method and device Download PDF

Info

Publication number
CN112767230A
CN112767230A CN202110222831.3A CN202110222831A CN112767230A CN 112767230 A CN112767230 A CN 112767230A CN 202110222831 A CN202110222831 A CN 202110222831A CN 112767230 A CN112767230 A CN 112767230A
Authority
CN
China
Prior art keywords
graph
gpu
visual range
data
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110222831.3A
Other languages
Chinese (zh)
Inventor
翟季冬
黄可钊
陈文光
郑纬民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110222831.3A priority Critical patent/CN112767230A/en
Publication of CN112767230A publication Critical patent/CN112767230A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

A GPU graph neural network optimization method and a computer readable medium are provided, the optimization method comprises the following steps: generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition; obtaining a plurality of equivalent calculation graphs aiming at the calculation graphs; comparing the calculated amount of each calculation graph, and selecting the calculation graph with the minimum calculated amount; and generating corresponding GPU codes for the selected computation graph. The visual range of data input and the visual range of data output required by each operation of the selected calculation graph can be obtained through analysis; the problem that the data visual range of the operation with the dependency relationship is not matched is solved; and merging the data visual matched operations into the same GPU-kernel. The graph neural network optimization method can reduce mismatching of data visual ranges among operators in the graph neural network, so that the operators are combined to reduce memory access; meanwhile, an equivalent calculation graph without redundancy calculation can be found, and the redundant calculation is reduced.

Description

GPU graph neural network optimization method and device
Technical Field
The present invention relates generally to graph neural networks, and more particularly, to a graph neural network optimization method and a graphics processing unit.
Background
In the past few years, Graph Neural Networks (GNNs) have become a major method of machine learning on graph data. It has the most advanced performance in prediction tasks on various graphs (e.g., node classification, graph classification, and link prediction). Also, GNN outperforms other methods in representing tasks (e.g., DeepWalk, Node2Vec) to provide a better representation of nodes for downstream tasks.
Thus, GNN accounts for over 90% of the leading performance in Graph-related tasks in Open Graph Benchmark (OGB) and CogDL. The scope of influence of GNNs encompasses many areas, from biology to medicine, social networking, personal recommendation, knowledge map processing, and the like. GNN is a machine learning method that combines graph-related operations with neural network operations, which encodes relationships between entities into node features through graph operations, transforms node features through neural network operations, and updates training parameters.
Disclosure of Invention
The performance (speed) of GNNs is critical for many of their applications. As researchers find deeper GNN models with better accuracy, and begin to study models on larger map data to better simulate real-world problems. However, the computational efficiency and manner of existing graph neural network systems limits the application and study of GNNs, resulting in some researchers sacrificing model accuracy in exchange for faster performance.
Although GNNs are primarily composed of graph-related operations and Neural network operations, simply putting together an graph computation framework and a dnn (deep Neural network) framework is not sufficient to efficiently support GNN execution because it cannot efficiently handle the complex interactions between graph-related operations and Neural network operations, which is critical to GNN performance.
For example, graph-dependent operations and neural network operations are staggered in GNN computational graphs, and complex data dependencies between them complicate kernel fusion, memory performance optimization, task scheduling, and load balancing.
Some GNN models perform neural operations even in graph structures, and it is difficult for existing frameworks to achieve high efficiency because the dense computational nature of the neural operations does not match the sparse patterns in the graph operations.
Existing GNN systems include PytrichGeometric (PyG) (see reference [1]), Deep Graph Library (DGL) (see reference [2]) and NeuGraph. However, none of these utilize and solve the combination between graph-related operations and neural network-related operations in GNNs, and therefore, there is a significant amount of overhead in their implementation, such as redundant computation and memory access. The inventors of the present invention found that on the Nvidia Tesla V100 GPU these frameworks achieved peak throughput of less than 10%, while on some more complex GNN models there was about 30% of the time for redundant memory access and computation.
The related documents are:
[1].Matthias Fey,and Jan Eric Lenssen."Fast Graph Representation Learning with PyTorch Geometric."CoRR abs/1903.02428(2019).
[2].Minjie Wang,et al."Deep Graph Library:Towards Efficient and Scalable Deep Learning on Graphs."CoRR abs/1909.01315(2019).
the present invention has been made in view of the above circumstances.
According to one aspect of the invention, a GPU graph neural network optimization method is provided, which includes: generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition; obtaining a plurality of equivalent calculation graphs aiming at the calculation graphs; comparing the calculated amount of each calculation graph, and selecting the calculation graph with the minimum calculated amount; and generating corresponding GPU codes for the selected computation graph.
Optionally, for the computation graph, obtaining a plurality of equivalent computation graphs includes: obtaining information of tensor and types of operation on the calculation graph; and (3) applying a commutative law according to the type of the operation and the information of the input tensor, and changing the calculation sequence and the mode of the calculation graph by combining the law and the distribution law.
Optionally, the information of the tensor comprises a shape and a position.
Optionally, comparing the calculation amounts of the respective calculation graphs, and selecting the calculation graph with the smallest calculation amount includes: and for each calculation graph, obtaining the number of floating point operations required by each calculation, and selecting the calculation graph with the minimum number of the required floating point operations.
Optionally, the GPU graph neural network optimization method may further include, after selecting the computation graph with the smallest computation amount: analyzing the selected calculation diagram to obtain a data visual range required to be input by each operation and an output data visual range; the problem that the data visual range of the operation with the dependency relationship is not matched is solved; and merging the data visual matched operations into the same GPU-kernel.
Optionally, the solving of the problem that the data visual range of the dependent operation does not match includes: the data visual range adapter is adopted to solve the problem of data visual range mismatch between interdependent operations, wherein the data visual range adapter can be operated as a code segment, detects the data visual range of the operations before and after, and obtains the minimum data visual range required by the operations, and then inserts the code segment, and the code segment is operated to enable the data to be shared by using inter-thread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the operations before and after into a kernel function.
Optionally, if the required minimum data visual range is shared in the Warp, the data visual range adapter performs sharing operation on the thread private data by using a GPU Warp Shuffle primitive; if the required minimum data visual range is shared in the thread block, the data visual range adapter uses the shared memory of the GPU, and the thread private data is stored in the GPU shared memory to realize the data sharing in the thread block.
According to another aspect of the present invention, there is provided a GPU graph neural network optimization method, including: generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition; aiming at the calculation graph, analyzing to obtain a data visual range required to be input by each operation and an output data visual range; the problem that the data visual range of the operation with the dependency relationship is not matched is solved; merging the visually matched operations of the data into the same GPU kernel function; and generating corresponding GPU codes for the modified calculation graph.
Optionally, the solving of the problem that the data visual range of the dependent operation does not match includes: the data visual range adapter is used for solving the problem of data visual range mismatch between interdependent operations, wherein the data visual range adapter is a code segment, detects the data visual range of the front and back operations, and obtains the minimum data visual range required by the operations, and then inserts the code segment, and the code segment is operated to enable the data to be shared by using inter-thread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the front and back operations into a kernel function.
According to another aspect of the present invention, there is provided a graphics processing unit GPU comprising a processor and a memory having stored thereon computer executable code operable, when executed by the processor, to perform the aforementioned GPU graph neural network optimization method.
According to another aspect of the present invention, there is provided a computer readable medium having stored thereon computer executable code, which when executed by a processor is operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
With the GPU graph neural network optimization technique according to embodiments of the present invention, one or more of the following advantages may be achieved:
(1) by detecting the data visual range among the operations in the graph neural network, for the mismatching of the data visual range, the mismatching problem is solved through the adapter in a fine-grained manner, so that a plurality of operations can be carried out in the same GPU function, and the calling times of the GPU kernel function and the access amount of the global memory are reduced.
(2) Meanwhile, for the calculation graph in the graph neural network, the invention can recursively apply an equivalent linear transformation method to generate a large number of equivalent calculation graphs, and the invention can automatically find the version of the calculation graph in which redundant calculation is avoided and apply the version to the training process.
Drawings
FIG. 1 shows a general flow diagram 100 of a GPU graph neural network optimization method according to an embodiment of the invention.
FIG. 2 shows an overall flowchart 200 of a GPU graph neural network optimization method according to another embodiment of the invention.
Fig. 3 is a schematic workflow diagram of a GPU graph neural network optimization method according to an embodiment of the present invention.
Fig. 4 shows an example of a computation graph according to an embodiment of the invention.
FIG. 5 illustrates one example of a data visibility range adapter in accordance with one embodiment of the present invention.
FIG. 6 illustrates an example of redundant computation elimination according to an embodiment of the present invention.
Detailed Description
Various embodiments of the present invention are described below with reference to the accompanying drawings.
FIG. 1 shows a general flow diagram 100 of a GPU graph neural network optimization method according to an embodiment of the invention.
As shown in fig. 1, in step S110, a computational graph including tensors and operations is generated for the GPU graph neural network model definition.
In step S120, a plurality of equivalent computation graphs are obtained for the computation graph.
In step S130, the calculation amounts of the respective calculation maps are compared, and the calculation map with the smallest calculation amount is selected.
In step S140, for the selected computation graph, a corresponding GPU code is generated.
The GPU graph neural network optimization method according to the embodiment of the present invention described in conjunction with fig. 1 exploits the combination of graph-related operations and neural network-related operations in a graph neural network, automatically detecting and reducing redundant computations by transforming the computational graph with linear properties.
FIG. 2 shows an overall flowchart 200 of a GPU graph neural network optimization method according to another embodiment of the invention.
In step S210, a computational graph including tensors and operations is generated for the GPU graph neural network model definition.
In step S220, the data visual range required to be input and the data visual range required to be output for each operation of the computation graph are analyzed and obtained.
In step S230, the problem that the data visibility ranges of the dependent operations do not match is solved.
In step S240, the visually matched operations of the data are merged into the same GPU kernel.
In step S250, for the modified computation graph, a corresponding GPU code is generated.
In the embodiment of the invention, the mismatching problem is solved in a fine-grained manner through the adapter for mismatching of the data visual range by detecting the data visual range among the operations in the graph neural network, so that a plurality of operations can be carried out in the same GPU function, and the calling times of the GPU kernel function and the access amount of the global memory are reduced.
Fig. 3 is a schematic workflow diagram of a GPU graph neural network optimization method according to an embodiment of the present invention, which combines the GPU graph neural network optimization methods shown in fig. 1 and fig. 2.
Firstly, a user inputs a code describing graph neural network calculation (as shown in 1 in fig. 3), and a classical analysis technology in a compiling technology is adopted to generate a calculation graph containing tensors and operations by analyzing the user code (as shown in 2 in fig. 3 and fig. 4, wherein each circle of a gray background represents a tensor, each circle of a white background represents an operation, and a connection relation between the circles represents a dependency relation between the operation and the tensor). After the computation graph is obtained, a plurality of equivalent candidate computation graphs (as shown in 3 in fig. 3) can be obtained through the computation graph conversion step in the "redundant computation elimination". The computation graph with the least number of floating-point calculations, i.e. no computational redundancy, is selected using floating-point operand analysis for these equivalent candidate computation graphs (as shown in 4 in fig. 3). On a computational graph without computational redundancy, the mismatch of the data visual range between operations can be solved through a data visual range adapter, so that the merging of kernel functions is completed (as shown in 5 in fig. 3, data at different stages are shared through the adapter, and the matching of the data visual range is completed). Finally, by means of code generation and compiling, an efficient executable program is obtained, and calculation of the neural network of the graph is performed. The following is a detailed operation of two steps: redundant computation elimination and data visibility range adapter.
One, redundant computation elimination
The redundant computation elimination method of the embodiment generates a computation graph formed by the computation operations by adopting an analysis technology according to codes which are input by a user and describe the computation contents of the graph neural network.
After generating the computation graph, the method may derive properties of the respective tensors and operations, such as shape, location, and type of operation. The operation types are as follows: element-by-element operations, specification operations, activation function operations, and other operations. The tensor positions are: edges, points (origin, destination) and global. According to the preset rule, whether the operation type and the tensor position respectively accord with the association law, the distribution law and the exchange law can be determined according to the operation type and the tensor position. And for the operation with corresponding property, performing related transformation (such as exchanging calculation sequence and the like) and generating a new calculation graph. On the new computation graph, the search and application of these transformations are continued, and a new equivalent computation graph continues to be generated.
In all the newly generated equivalent calculation graphs, according to the shape, the position and the type of the operation of the tensor, the number of floating point operations required by each operation can be obtained, so that the total number of floating point operands required by the whole calculation graph is obtained, and the calculation graph with the minimum number of required floating point operations is selected.
As shown in fig. 6, in the gray background box of the computation graph (left side of fig. 6) before changing, the feature vector (src.heat) of the source node and the feature vector (dst.heat) of the destination node are added to obtain the sum of the feature vectors (src.heat + dst.heat), and then matrix multiplication is applied to the sum of the feature vectors to perform conversion ((src.heat + dst.heat) ((src.weight)). After applying redundant computation to the original computation graph to eliminate, a transformed computation graph (right side in fig. 6) is obtained, which first performs conversion of the feature vector (src.flat) of the source node and the feature vector (dst.flat) of the destination node, and then sums the converted feature vector (src.flat _ weight) of the source node and the converted feature vector (dst.flat _ weight) of the destination node to obtain (src.flat _ weight + dst.flat _ weight). The changed calculation diagram is calculated from edge to point due to the matrix multiplication operation, and the number of edges in the diagram data is far larger than the number of points, so that the calculation amount is greatly reduced.
Second, data visual range adapter
On the basis of the obtained final calculation graph, for each operation, according to the operation type and the shape of the input and output tensors, the required input data visual range and the output data visual range are obtained. The data visual range refers to that data is shared by threads in which ranges, and there are four data visual ranges, namely thread private, sharing in Warp, sharing in thread blocks and global sharing. When the data visibility ranges of the two operations are not matched, the two operations need to be executed in different kernel functions, and reading and writing of the global memory are generated, so that redundant memory access overhead is caused.
In the embodiment of the invention, after the dependency analysis is carried out on the calculation graph, if the data visual range mismatch exists between the operations with the dependency relationship, the data visual range mismatch between the operations with the mutual dependency is solved by adopting the data visual range adapter.
The data visibility range adapter may be, for example, a piece of code that, when executed, operates to detect the data visibility range of a previous or subsequent operation and to derive the minimum data visibility range required for the operation, and then changes the data visibility range by sharing the data using a different method (inter-thread communication or shared memory) so that the data visibility range satisfies the requirement of merging the previous or subsequent operations into one kernel function. If the required minimum data visual range is shared in the Warp, the adapter adopts GPU Warp Shuffle primitive to share the thread private data; if the required minimum data visual range is shared in the thread block, the adapter uses the shared memory of the GPU, and data sharing in the thread block is realized by storing thread private data into the shared memory of the GPU.
As shown on the left side of fig. 5, the data visibility required for an operation is at the thread block level, but existing system DGLs and pygs ignore the properties of the data visibility of an operation, and even if data only needs to be shared within a thread block, the DGLs and pygs default to placing the operation in separate kernels for global sharing. The data visibility range adapter proposed by the method can determine the required minimum visibility range of the operation, utilize the shared memory to extend the visibility range to the sharing in the thread block, and share the data from the thread private to the inside of the thread block, as shown in the right side of fig. 5, the thread block of the next operation can continue to calculate without waiting for other blocks or accessing the global memory. Therefore, the two kernel functions can be combined together by using the method, and redundant memory access is reduced.
FIG. 3 illustrates a graph neural network optimization method for eliminating redundant memory accesses and computational overhead according to an embodiment of the present invention, which can reduce the mismatch of data visibility ranges between operators in the graph neural network, thereby merging operators to reduce memory accesses; meanwhile, an equivalent calculation graph without redundancy calculation can be found, and the redundant calculation is reduced.
According to another embodiment of the present invention, there is provided a graphics processing unit GPU comprising a processor and a memory having stored thereon computer executable code operable when executed by the processor to perform the aforementioned GPU graph neural network optimization method.
According to another embodiment of the present invention, there is provided a computer-readable medium having stored thereon computer-executable code, which when executed by a processor, is operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
The invention can be implemented in various forms of software, hardware or a combination thereof, and can be distributed or centralized.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A GPU graph neural network optimization method comprises the following steps:
generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition;
obtaining a plurality of equivalent calculation graphs aiming at the calculation graphs;
comparing the calculated amount of each calculation graph, and selecting the calculation graph with the minimum calculated amount;
and generating corresponding GPU codes for the selected computation graph.
2. The GPU graph neural network optimization method of claim 1, for the computational graph, deriving a plurality of equivalent computational graphs comprising:
obtaining information of tensor and types of operation on the calculation graph;
and (3) applying a commutative law according to the type of the operation and the information of the input tensor, and changing the calculation sequence and the mode of the calculation graph by combining the law and the distribution law.
3. The GPU graph neural network optimization method of claim 1, comparing the computation amounts of the respective computation graphs, and selecting the computation graph with the smallest computation amount comprises:
and for each calculation graph, obtaining the number of floating point operations required by each calculation, and selecting the calculation graph with the minimum number of the required floating point operations.
4. The GPU graph neural network optimization method of claim 1, further comprising, after selecting the computation graph with the smallest computation amount:
analyzing the selected calculation diagram to obtain a data visual range required to be input by each operation and an output data visual range;
the problem that the data visual range of the operation with the dependency relationship is not matched is solved;
and merging the data visual matched operations into the same GPU-kernel.
5. The GPU graph neural network optimization method of claim 4, wherein the solving of the problem of data visibility range mismatch of dependent operations comprises:
the data visual range adapter is adopted to solve the problem of data visual range mismatch between operations which are mutually dependent, wherein the data visual range adapter is operable to detect the data visual range of front and back operations and obtain the minimum data visual range required by the operations, and then a code segment is inserted, and the code segment is operated to enable data to be shared by using inter-thread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the front and back operations into a kernel function.
6. The GPU graph neural network optimization method of claim 5, wherein if the required minimum data visual range is shared in Warp, the data visual range adapter performs sharing operation on thread private data by adopting GPU Warp Shuffle primitives; if the required minimum data visual range is shared in the thread block, the data visual range adapter uses the shared memory of the GPU, and the thread private data is stored in the GPU shared memory to realize the data sharing in the thread block.
7. A GPU graph neural network optimization method comprises the following steps:
generating a calculation graph comprising tensors and operations for the GPU graph neural network model definition;
aiming at the calculation graph, analyzing to obtain a data visual range required to be input by each operation and an output data visual range;
the problem that the data visual range of the operation with the dependency relationship is not matched is solved;
merging the visually matched operations of the data into the same GPU kernel function;
and generating corresponding GPU codes for the modified calculation graph.
8. The GPU graph neural network optimization method of claim 7, the solving the problem of data visibility range mismatch for dependent operations comprising:
the data visual range adapter is used for solving the problem of data visual range mismatch between interdependent operations, wherein the data visual range adapter is a code segment, detects the data visual range of the front and back operations, and obtains the minimum data visual range required by the operations, and then inserts the code segment, and the code segment is operated to enable the data to be shared by using inter-thread communication or shared memory to change the data visual range, so that the data visual range meets the requirement of merging the front and back operations into a kernel function.
9. A GPU graph neural network optimization device, comprising a processor and a memory, the memory having computer-executable code stored thereon, the code, when executed by the processor, being operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
10. A computer readable medium having computer executable code stored thereon, the code, when executed by a processor, being operable to perform the GPU graph neural network optimization method of any of claims 1 to 8.
CN202110222831.3A 2021-02-26 2021-02-26 GPU graph neural network optimization method and device Pending CN112767230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110222831.3A CN112767230A (en) 2021-02-26 2021-02-26 GPU graph neural network optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110222831.3A CN112767230A (en) 2021-02-26 2021-02-26 GPU graph neural network optimization method and device

Publications (1)

Publication Number Publication Date
CN112767230A true CN112767230A (en) 2021-05-07

Family

ID=75704331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110222831.3A Pending CN112767230A (en) 2021-02-26 2021-02-26 GPU graph neural network optimization method and device

Country Status (1)

Country Link
CN (1) CN112767230A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237918A (en) * 2022-02-28 2022-03-25 之江实验室 Graph execution method and device for neural network model calculation
CN115268877A (en) * 2022-09-27 2022-11-01 之江实验室 Intermediate representation method and device for parallel execution of graph computation
WO2023071149A1 (en) * 2021-10-27 2023-05-04 上海商汤智能科技有限公司 Video memory optimization method and apparatus, device, storage medium and program product
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160284095A1 (en) * 2015-03-27 2016-09-29 Edmond Chalom Machine learning of real-time image capture parameters
US10074206B1 (en) * 2017-05-23 2018-09-11 Amazon Technologies, Inc. Network-optimized graphics library for virtualized graphics processing
US20190311214A1 (en) * 2018-04-05 2019-10-10 Imagination Technologies Limited Matching Local Image Feature Descriptors in Image Analysis
CN111338635A (en) * 2020-02-20 2020-06-26 腾讯科技(深圳)有限公司 Graph compiling method, device and equipment for calculation graph and storage medium
CN111401538A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160284095A1 (en) * 2015-03-27 2016-09-29 Edmond Chalom Machine learning of real-time image capture parameters
US10074206B1 (en) * 2017-05-23 2018-09-11 Amazon Technologies, Inc. Network-optimized graphics library for virtualized graphics processing
US20190311214A1 (en) * 2018-04-05 2019-10-10 Imagination Technologies Limited Matching Local Image Feature Descriptors in Image Analysis
CN111401538A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN111338635A (en) * 2020-02-20 2020-06-26 腾讯科技(深圳)有限公司 Graph compiling method, device and equipment for calculation graph and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KEZHAO HUANG 等: "Understanding and Bridging the Gaps in Current GNN Performance Optimizations", 《PPOPP "21: PROCEEDINGS OF THE 26TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023071149A1 (en) * 2021-10-27 2023-05-04 上海商汤智能科技有限公司 Video memory optimization method and apparatus, device, storage medium and program product
CN114237918A (en) * 2022-02-28 2022-03-25 之江实验室 Graph execution method and device for neural network model calculation
US11941514B2 (en) 2022-02-28 2024-03-26 Zhejiang Lab Method for execution of computational graph in neural network model and apparatus thereof
CN115268877A (en) * 2022-09-27 2022-11-01 之江实验室 Intermediate representation method and device for parallel execution of graph computation
US11782723B1 (en) 2022-09-27 2023-10-10 Zhejiang Lab Intermediate representation method and apparatus for parallel execution of graph computation

Similar Documents

Publication Publication Date Title
US11809993B2 (en) Systems and methods for determining graph similarity
CN112767230A (en) GPU graph neural network optimization method and device
CN113449857B (en) Data processing method and data processing equipment
US11507846B2 (en) Representing a neural network utilizing paths within the network to improve a performance of the neural network
JP6954049B2 (en) Methods and equipment to complete the knowledge graph
US20200394459A1 (en) Cell image synthesis using one or more neural networks
Sun et al. Feature selection using rough entropy-based uncertainty measures in incomplete decision systems
Lust et al. Two-phase Pareto local search for the biobjective traveling salesman problem
Wang et al. Clustering aggregation by probability accumulation
Wu et al. Enabling on-device cnn training by self-supervised instance filtering and error map pruning
Sheth Transforming big data into smart data: Deriving value via harnessing volume, variety, and velocity using semantic techniques and technologies
CN110383247A (en) Method, computer-readable medium and heterogeneous computing system performed by computer
US20210209270A1 (en) Distributed tensor network contraction scheme with splitting based on dynamic ordering
Gerber et al. Data analysis with the morse-smale complex: The msr package for r
CN112836787A (en) Reducing deep neural network training times through efficient hybrid parallelization
Shao et al. Deep multi-center learning for face alignment
Chai et al. A model-agnostic approach to mitigate gradient interference for multi-task learning
Zhao et al. APUNet: Attention-guided upsampling network for sparse and non-uniform point cloud
US20220129755A1 (en) Incorporating a ternary matrix into a neural network
CN115860061A (en) Graph neural network optimization method and graph neural network inference system
US20220343146A1 (en) Method and system for temporal graph neural network acceleration
Tran et al. A distributed data mining framework accelerated with graphics processing units
Lai et al. Efficient guided hypothesis generation for multi-structure epipolar geometry estimation
CN111723247A (en) Graph-based hypothetical computation
Cai et al. The multi-task learning with an application of Pareto improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210507

WD01 Invention patent application deemed withdrawn after publication