WO2022198713A1 - 基于图形处理器的图采样和随机游走加速方法及系统 - Google Patents

基于图形处理器的图采样和随机游走加速方法及系统 Download PDF

Info

Publication number
WO2022198713A1
WO2022198713A1 PCT/CN2021/084847 CN2021084847W WO2022198713A1 WO 2022198713 A1 WO2022198713 A1 WO 2022198713A1 CN 2021084847 W CN2021084847 W CN 2021084847W WO 2022198713 A1 WO2022198713 A1 WO 2022198713A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
sampling
vertex
gpu
task queue
Prior art date
Application number
PCT/CN2021/084847
Other languages
English (en)
French (fr)
Inventor
李超
王鹏宇
王靖
朱浩瑾
过敏意
Original Assignee
上海交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学 filed Critical 上海交通大学
Priority to US17/661,663 priority Critical patent/US11875426B2/en
Publication of WO2022198713A1 publication Critical patent/WO2022198713A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Definitions

  • the invention relates to a technology in the field of data processing, in particular to a graphics processor-based graph sampling and random walk acceleration method and system for artificial intelligence applications.
  • Graph sampling and random walk select the subgraphs of the original graph data through certain evaluation criteria.
  • Graph sampling and random walk are common processing techniques for graph data, which can significantly reduce the overhead of large graph processing, while maintaining high accuracy and other indicators in artificial intelligence applications, but graph sampling and random walk themselves are computing The process also needs to consume a lot of time, and accelerating it will help improve the overall performance.
  • Partial graph sampling and random walk refer to the process of randomly selecting neighbor vertices according to the weights of the edges connected to the neighbor vertices in the process of selecting the neighbor vertices of a vertex in the graph. Biased graph sampling and random walk usually need to calculate the transition probability of neighbor vertices to ensure the randomness of selection.
  • Alias method is a method for calculating transition probability, but it is generally considered difficult to be efficiently parallelized, so there is no system that utilizes GPU to process alias methods.
  • the present invention proposes a graph sampling and random walk acceleration method and system based on a graphics processor. , which can execute alias methods efficiently and in parallel, can significantly improve the performance of graph data processing on the same hardware platform, including increasing sample throughput throughput, and reducing overall runtime.
  • the invention relates to a graph sampling and random walk acceleration method based on a graphics processor.
  • a central processing unit Central Processing Unit, CPU
  • CPU Central Processing Unit
  • CSR compressed Sparse Row
  • the GPU generates an alias table (Alias Table) in real time and performs sampling according to the set working mode; or offline determines whether there is a pre-generated alias table and performs sampling.
  • the central processing unit initiates a kernel function executed on the GPU, and the thread block of the kernel function continuously participates in a variable-sized thread work group during its life cycle to process the tasks in the task queue and realize the execution of random walks model.
  • the continuous participation of the threads in the variable-sized thread work group means that the sampling tasks are stored in the global task queue, and the threads in the thread block continuously process the tasks in the task queue. Participate in subwarps, warps, or thread blocks to sample cooperatively.
  • the different states of the thread block include: thread sub-warp coordination, thread warp coordination and thread block coordination.
  • the sampling is realized by adopting the load balancing strategy of allocating to-be-processed vertices to a graphics processor thread group of different sizes according to the degree of the vertices to be sampled, specifically: according to the out-degree size and threshold of the vertices in the out-degree array
  • the vertices are sorted and then processed with thread subwarps, thread warps, or thread blocks, respectively.
  • the vertex classification includes: 1 when the out-degree of the vertex to be sampled is d ⁇ T 1 , the vertex is processed by a thread warp; 2 when T 1 ⁇ d ⁇ T 2 , the vertex is processed by a thread warp ; 3 When T 2 ⁇ d, the vertex is processed by a thread block, wherein: the out-degree of a vertex is the number d of edges pointing to other vertices of each vertex in the directed graph, and the two thresholds T 1 and T 2 are respectively for 8 and 32.
  • the thread subwarps are usually 4 or 8 threads in size; the thread warps are usually 32 threads in size; the thread blocks are usually 256, 512 or 1024 threads in size.
  • the specified warp processing threshold is smaller than the thread block processing threshold.
  • the invention relates to a system for implementing the above method, comprising: a CPU main control module, a data management module, a GPU execution module and a GPU scheduling module, wherein: the data management module is connected with the CPU main control module and manages data movement according to control instructions, and the CPU The main control module is connected with the GPU execution module and transmits task execution information, and the GPU scheduling module is connected with the GPU execution module and transmits the scheduling information.
  • the present invention solves the problems of the prior art such as low parallelism, low accelerator execution efficiency, and long overall running time. and offline load support.
  • Fig. 1 is the system structure diagram of the present invention
  • Fig. 2 is the flow chart of the present invention
  • Fig. 3 is a schematic diagram of GPU iteration execution
  • FIG. 4 is a schematic diagram of sampling in real-time working mode for task groups of different sizes.
  • the present embodiment relates to a graph sampling and random walk acceleration method based on a graphics processor.
  • the graph data is read from a storage medium by the CPU, converted into a CSR format, and then output to the GPU.
  • Determined working mode real-time generation of alias table and sampling; or offline judgment of whether there is a pre-generated alias table and sampling, among which: the graph structure data in the initial stage is stored in the memory of the graphics processor, and the vertices to be processed are stored in the In the global task queue; in the iterative execution phase, the thread groups in the kernel function independently process the tasks in the global task queue until the global task queue is empty.
  • the offline judgment refers to: when there is no pre-generated alias table, first generate the alias table for the whole image and then perform sampling, otherwise, use the existing alias table for sampling.
  • the generation of the alias table for the full graph specifically includes:
  • Step 1.1 The thread sub-beam of each GPU kernel function requests a task from the global task queue.
  • the threads in this thread sub-warp perform cooperative processing on the vertex, that is, At the same time, multiple GPU threads jointly calculate the alias table for a vertex; when T 1 ⁇ d ⁇ T 2 , the vertex is temporarily added to the local task queue for subsequent processing; when T 2 ⁇ d, the vertex is added to the The global height vertex task queue, where T 1 and T 2 are two adjustable thresholds, 32 and 256 in practice; d is the out-degree of the vertex, that is, the number of edges sent by the vertex.
  • the global task queue is generated by the main control CPU thread at the initial stage, and includes D sub-task queues, where D is the sampling depth, each sub-queue contains a vertex to be processed, and the queue whose sampling depth is 0 is initialized this time. The root vertex of the sampling task.
  • the degree of the vertex to be processed is higher than the adjustable threshold T 2 .
  • kernel functions can add tasks to subqueues.
  • the local task queue is generated and maintained by each thread block, which stores the vertices obtained by the thread sub-warps that satisfy T 1 ⁇ d ⁇ T 2 .
  • the thread subwarp refers to a set of multiple adjacent GPU threads at runtime, and the size is usually 4, 8 or 16.
  • Step 1.2 When the global task queue corresponding to the current iteration is empty, the completed thread in a thread warp waits for other threads in the same thread warp to participate in the processing of the local task queue.
  • Step 1.3 When the local task queue corresponding to this iteration is empty, the completed thread in one thread block waits for other threads in the same thread block to participate in the processing of the global height vertex task queue.
  • Step 1.4 The sampling depth is judged by each thread warp, and it is checked whether the current sampling depth is equal to the target sampling depth. When the sampling does not reach the target depth, repeat steps 1.1-1.4 for iteration, otherwise the processing ends.
  • the alias table is generated and sampled in real time, which specifically includes:
  • Step 2.1 The work group of each GPU kernel function requests a task from the corresponding task queue to obtain a vertex to be sampled.
  • Step 2.2 Each working group checks the memory of the graphics processor. If there is an alias table (Alias Table) for the vertices to be sampled, it goes directly to the next step. Otherwise, multiple GPU threads cooperate to build an alias table for the vertices to be sampled at the same time. .
  • Alias Table Alias Table
  • Step 2.3 Each working group uses min(
  • Step 2.4 Each workgroup stores the constructed alias table to the GPU's memory for future use.
  • the real-time and offline random walk throughput is increased by 499 and 33 times; compared with the GPU-based C-SAW system, the real-time and offline sampling throughput is increased by 83 and 65 times, and the real-time and offline random walk Throughput increased by 18 and 13 times.
  • the method makes full use of the high parallelism of the GPU, ensures load balancing, and reduces the running time overhead.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)

Abstract

一种基于图形处理器的图采样和随机游走加速方法及系统,通过CPU从存储介质中读取图数据并转化为CSR格式后输出至GPU,GPU根据设定的工作模式:实时生成别名表并进行采样;或离线判断是否已有预先生成的别名表并进行采样。本发明能够高效、并行地执行别名方法,能够在相同硬件平台上显著提升图数据处理的性能,包括提高采样吞吐量吞吐量,以及减少整体运行时间。

Description

基于图形处理器的图采样和随机游走加速方法及系统 技术领域
本发明涉及的是一种数据处理领域的技术,具体是一种针对人工智能应用的基于图形处理器的图采样和随机游走加速方法及系统。
背景技术
图采样、随机游走通过一定的评判标准,选取原图数据的子图。图采样、随机游走是对图数据的常用处理技术,可以显著降低对大图处理的开销,同时在人工智能应用中保持较高的准确率等指标,但是图采样和随机游走本身在计算过程中也需要消耗大量的时间,对其进行加速有利于整体性能的提高。
有偏图采样、随机游走是指:在选取图中某顶点的邻居顶点的过程中,依据与邻居顶点相连边的权重对邻居顶点进行随机挑选的过程。有偏图采样、随机游走通常需要计算邻居顶点的转移概率来保证挑选的随机性。别名方法(Alias Method)是计算转移概率的一种方法,但是其一般被认为难以被高效并行化,因而并无利用GPU处理别名方法的系统。
发明内容
本发明针对现有有偏图采样、随机游走系统的复杂度高、加速器执行效率低、整体运行时间较长等缺点,提出一种基于图形处理器的图采样和随机游走加速方法及系统,能够高效、并行地执行别名方法,能够在相同硬件平台上显著提升图数据处理的性能,包括提高采样吞吐量吞吐量,以及减少整体运行时间。
本发明是通过以下技术方案实现的:
本发明涉及一种基于图形处理器的图采样和随机游走加速方法,通过中央处理器(Central Processing Unit,CPU)从存储介质中读取图数据并转化为压缩稀疏行(Compressed Sparse Row,CSR)格式后输出至GPU,GPU根据设定的工作模式:实时生成别名表(Alias Table)并进行采样;或离线判断是否已有预先生成的别名表并进行采样。
所述的中央处理器发起在GPU执行的核函数,该核函数的线程块在其生命周期中不断地参与可变大小的线程工作组,以处理任务队列中的任务,实现随机游走的执行模式。
所述的线程不断参与可变大小的线程工作组是指:采样任务存储在全局任务队列中,线程块中的线程不断地处理任务队列中的任务,根据线程块的不同状态,其中的线程分别参与线程子束、线程束或线程块以协同进行采样。
所述的线程块的不同状态包括:线程子束协同、线程束协同和线程块协同。
所述的采样,通过采用根据待采样顶点的度数将待处理顶点分配给一个不同大小的图形处理器线程组的负载均衡策略实现,具体为:根据出度数组中顶点的出度大小和阈值将顶点分类,然后分别用线程子束(subwarp)、线程束(thread warp)或线程块(thread block)进行处理。
所述的顶点分类,包括:①当待采样顶点出度d<T 1,则该顶点采用一个线程子束进行处理;②当T 1<d<T 2,则该顶点采用一个线程束进行处理;③当T 2<d,则该顶点采用一个线程块进行处理,其中:顶点的出度为有向图中各顶点的指向其他顶点的边的数目d,两个阈值T 1和T 2分别为8和32。
所述的线程子束,通常大小为4或8个线程;线程束,通常大小为32个线程;线程块,通常大小为256,512或1024个线程。
所述的指定线程束处理阈值小于线程块处理阈值。
本发明涉及一种实现上述方法的系统,包括:CPU主控模块、数据管理模块、GPU执行模块和GPU调度模块,其中:数据管理模块与CPU主控模块相连并根据控制指令管理数据移动,CPU主控模块与GPU执行模块相连并传输任务执行信息,GPU调度模块与GPU执行模块相连并传输调度信息。
技术效果
本发明整体解决了现有技术的并行度低、加速器执行效率低、整体运行时间较长等不足;与现有技术相比,本发明低运行时开销、负载均衡的执行调度策略,以及对实时和离线负载的支持。
附图说明
图1为本发明系统结构图;
图2为本发明流程图;
图3为GPU迭代执行示意图;
图4为不同大小任务组在实时工作模式下进行采样的示意图。
具体实施方式
如图2所示,为本实施例涉及一种基于图形处理器的图采样和随机游走加速方法,通过CPU从存储介质中读取图数据并转化为CSR格式后输出至GPU,GPU根据设定的工作模式:实时生成别名表并进行采样;或离线判断是否已有预先生成的别名表并进行采样,其中:起始阶段图结构数据储存在图形处理器的内存中,待处理的顶点存储在全局任务队列中;迭代执行阶段,核函数中的线程组各自独立地处理全局任务队列中的任务,直到全局任务队列为空。
所述的离线判断是指:当无预先生成的别名表时,先为全图生成别名表后再进行采样,否则利用已有别名表进行采样。
所述的为全图生成别名表,具体包括:
步骤1.1:每个GPU核函数的线程子束向全局任务队列请求任务,当取得任务中的待采样顶点出度d<T 1,则此线程子束中的线程对该顶点进行协同处理,即多个GPU线程在同一时间,共同的为一个顶点计算别名表;当T 1<d<T 2,则将该顶点暂时加入本地任务队列待后续处理;当T 2<d,则将该顶点加入全局高度顶点任务队列,其中T 1、T 2为两个可调节的阈值,在实践中为32和256;d为顶点的出度,即顶点发出的边的个数。
所述的全局任务队列由主控CPU线程在起始阶段生成,包含D个子任务队列,其中D是采样深度,每个子队列各包含一个待处理顶点,其中采样深度为0的队列初始化为本次采样任务的根顶点。
所述的待处理顶点的度数高于所述可调节的阈值T 2
在迭代处理过程中,核函数可以向子队列中添加任务。
所述的本地任务队列由每个线程块各自生成并维护,其存储了线程子束获得的满足T 1<d<T 2的顶点。
所述的线程子束是指:在运行时多个相邻GPU线程的集合,大小通常为4、8或16。
步骤1.2:当本轮迭代对应的全局任务队列为空,则一个线程束中已完成的线程等待同一个线程束中的其他线程,以参与对本地任务队列的处理过程。
步骤1.3:当本轮迭代对应的本地任务队列为空,则一个线程块中已完成的线程等待同一个线程块中的其他线程,以参与对全局高度顶点任务队列的处理过程。
步骤1.4:由每个线程束对采样深度进行判断,检查当前采样深度是否等于目标采样深度,当采样未达到目标深度,则重复步骤1.1-1.4进行迭代,否则处理结束。
如图4所示,所述的实时生成别名表并进行采样,具体包括:
步骤2.1:每个GPU核函数的工作组向对应任务队列请求任务,获取一个待采样顶点。
步骤2.2:各工作组检查图形处理器的内存中当存在已有待采样顶点的别名表(Alias Table)时直接进入下一步骤,否则多个GPU线程协同地在同一时间为待采样顶点构建别名表。
步骤2.3:各工作组使用min(|WG|,k)个线程进行采样,其中:min表示取最小值,|WG|表示工作组中线程的个数,k表示每个点在当前深度采样的大小。
步骤2.4:各工作组将构建的别名表存储到图形处理器的内存,以备未来使用。
经过具体实际实验,在配备有两个2.40GHz Intel Xeon 6148CPU,256GB内存,1个NVIDIA RTX 2080Ti GPU的测试服务器平台上,以T 1=8、T 2=256、线程子束大小为4、线程子束大小为32、线程块大小为256运行上述方法,在Arabic-2005数据集上,实时、离线两种工作模式下的采样的吞吐量分别为536、2545百万边每秒(million sampled edge per second), 实时、离线两种工作模式下的随机游走的吞吐量分别为65、3399、8175百万边每秒。相当于基于CPU的KnightKing系统,实时、离线随机游走吞吐量提升499、33倍;相对于基于GPU的C-SAW系统,实时、离线采样吞吐量提升83和65倍,实时、离线随机游走吞吐量提升18和13倍。
与现有技术相比,本方法充分利用GPU的高并行度,保证了负载均衡,减少了运行时的开销。
上述具体实施可由本领域技术人员在不背离本发明原理和宗旨的前提下以不同的方式对其进行局部调整,本发明的保护范围以权利要求书为准且不由上述具体实施所限,在其范围内的各个实现方案均受本发明之约束。

Claims (9)

  1. 一种基于图形处理器的图采样和随机游走加速方法,其特征在于,通过CPU从存储介质中读取图数据并转化为CSR格式后输出至GPU,GPU根据设定的工作模式:实时生成别名表并进行采样;或离线判断是否已有预先生成的别名表并进行采样;
    所述的中央处理器发起在GPU执行的核函数,该核函数的线程块在其生命周期中不断地参与可变大小的线程工作组,以处理任务队列中的任务,实现随机游走的执行模式;
    所述的离线判断是指:当无预先生成的别名表时,先为全图生成别名表后再进行采样,否则利用已有别名表进行采样。
  2. 根据权利要求1所述的基于图形处理器的图采样和随机游走加速方法,其特征是,所述的线程不断参与可变大小的线程工作组是指:采样任务存储在全局任务队列中,线程块中的线程不断地处理任务队列中的任务,根据线程块的不同状态,其中的线程分别参与线程子束、线程束或线程块以协同进行采样。
  3. 根据权利要求2所述的基于图形处理器的图采样和随机游走加速方法,其特征是,所述的线程块的不同状态包括:线程子束协同、线程束协同和线程块协同。
  4. 根据权利要求1所述的基于图形处理器的图采样和随机游走加速方法,其特征是,所述的采样,通过采用根据待采样顶点的度数将待处理顶点分配给一个不同大小的图形处理器线程组的负载均衡策略实现,具体为:根据出度数组中顶点的出度大小和阈值将顶点分类,然后分别用线程子束、线程束或线程块进行处理。
  5. 根据权利要求1~4中任一所述的基于图形处理器的图采样和随机游走加速方法,其特征是,所述的顶点分类,包括:①当待采样顶点出度d<T 1,则该顶点采用一个线程子束进行处理;②当T 1<d<T 2,则该顶点采用一个线程束进行处理;③当T 2<d,则该顶点采用一个线程块进行处理,其中:顶点的出度为有向图中各顶点的指向其他顶点的边的数目d,两个阈值T 1和T 2
  6. 根据权利要求5所述的基于图形处理器的图采样和随机游走加速方法,其特征是,通过全局任务队列来指示GPU核函数线程在线程子束协同、线程束协同和线程块协同三种状态间切换,而无需多次启动核函数。
  7. 根据权利要求1所述的基于图形处理器的图采样和随机游走加速方法,其特征是,所述的为全图生成别名表,具体包括:
    步骤1:每个GPU核函数的线程子束向全局任务队列请求任务,当取得任务中的待采样顶点出度d<T 1,则此线程子束中的线程对该顶点进行协同处理,即多个GPU线程在同一时间,共同的为一个顶点计算别名表;当T 1<d<T 2,则将该顶点暂时加入本地任务队列待后续处理;当T 2<d,则将该顶点加入全局高度顶点任务队列,其中T 1、T 2为两个可调节的阈值,在实践中为32和256;d为顶点的出度,即顶点发出的边的个数;
    步骤2:当本轮迭代对应的全局任务队列为空,则一个线程束中已完成的线程等待同一个线程束中的其他线程,以参与对本地任务队列的处理过程;
    步骤3:当本轮迭代对应的本地任务队列为空,则一个线程块中已完成的线程等待同一个线程块中的其他线程,以参与对全局高度顶点任务队列的处理过程;
    步骤4:由每个线程束对采样深度进行判断,检查当前采样深度是否等于目标采样深度,当采样未达到目标深度,则重复步骤1-4进行迭代,否则处理结束。
  8. 根据权利要求1所述的基于图形处理器的图采样和随机游走加速方法,其特征是,所述的实时生成别名表并进行采样,具体包括:
    步骤①:每个GPU核函数的工作组向对应任务队列请求任务,获取一个待采样顶点;
    步骤②:各工作组检查图形处理器的内存中当存在已有待采样顶点的别名表时直接进入下一步骤,否则多个GPU线程协同地在同一时间为待采样顶点构建别名表;
    步骤③:各工作组使用min(|WG|,k)个线程进行采样,其中:min表示取最小值,|WG|表示工作组中线程的个数,k表示每个点在当前深度采样的大小;
    步骤④:各工作组将构建的别名表存储到图形处理器的内存,以备未来使用。
  9. 一种实现权利要求1~8中任一所述方法的系统,其特征在于,包括:CPU主控模块、数据管理模块、GPU执行模块和GPU调度模块,其中:数据管理模块与CPU主控模块相连并根据控制指令管理数据移动,CPU主控模块与GPU执行模块相连并传输任务执行信息,GPU调度模块与GPU执行模块相连并传输调度信息。
PCT/CN2021/084847 2021-03-25 2021-04-01 基于图形处理器的图采样和随机游走加速方法及系统 WO2022198713A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/661,663 US11875426B2 (en) 2021-03-25 2022-05-02 Graph sampling and random walk acceleration method and system on GPU

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110318094.7 2021-03-25
CN202110318094.7A CN112925627B (zh) 2021-03-25 2021-03-25 基于图形处理器的图采样和随机游走加速方法及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/661,663 Continuation US11875426B2 (en) 2021-03-25 2022-05-02 Graph sampling and random walk acceleration method and system on GPU

Publications (1)

Publication Number Publication Date
WO2022198713A1 true WO2022198713A1 (zh) 2022-09-29

Family

ID=76175981

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084847 WO2022198713A1 (zh) 2021-03-25 2021-04-01 基于图形处理器的图采样和随机游走加速方法及系统

Country Status (2)

Country Link
CN (1) CN112925627B (zh)
WO (1) WO2022198713A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11875426B2 (en) * 2021-03-25 2024-01-16 Shanghai Jiao Tong University Graph sampling and random walk acceleration method and system on GPU
CN116188239B (zh) * 2022-12-02 2023-09-12 上海交通大学 多请求并发的gpu图随机游走优化实现方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110187968A (zh) * 2019-05-22 2019-08-30 上海交通大学 异构计算环境下的图数据处理加速方法
CN110196995A (zh) * 2019-04-30 2019-09-03 西安电子科技大学 一种基于带偏置随机游走的复杂网络特征提取方法
US20210019558A1 (en) * 2019-07-15 2021-01-21 Microsoft Technology Licensing, Llc Modeling higher-level metrics from graph data derived from already-collected but not yet connected data
CN112417224A (zh) * 2020-11-27 2021-02-26 华中科技大学 一种基于熵驱动的随机游走的图嵌入方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090322781A1 (en) * 2008-06-30 2009-12-31 Mikhail Letavin Anti-aliasing techniques for image processing
US8767734B1 (en) * 2008-10-07 2014-07-01 BCK Networks, Inc. Stream basis set division multiplexing
CN104268021A (zh) * 2014-09-15 2015-01-07 西安电子科技大学 基于图形处理器的rs解码方法
CN108009933B (zh) * 2016-10-27 2021-06-11 中国科学技术大学先进技术研究院 图中心性计算方法及装置
CN111950594B (zh) * 2020-07-14 2023-05-05 北京大学 基于子图采样的大规模属性图上的无监督图表示学习方法和装置
CN112258378A (zh) * 2020-10-15 2021-01-22 武汉易维晟医疗科技有限公司 基于gpu加速的实时三维测量系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196995A (zh) * 2019-04-30 2019-09-03 西安电子科技大学 一种基于带偏置随机游走的复杂网络特征提取方法
CN110187968A (zh) * 2019-05-22 2019-08-30 上海交通大学 异构计算环境下的图数据处理加速方法
US20210019558A1 (en) * 2019-07-15 2021-01-21 Microsoft Technology Licensing, Llc Modeling higher-level metrics from graph data derived from already-collected but not yet connected data
CN112417224A (zh) * 2020-11-27 2021-02-26 华中科技大学 一种基于熵驱动的随机游走的图嵌入方法及系统

Also Published As

Publication number Publication date
CN112925627A (zh) 2021-06-08
CN112925627B (zh) 2022-03-29

Similar Documents

Publication Publication Date Title
WO2022198713A1 (zh) 基于图形处理器的图采样和随机游走加速方法及系统
US11875426B2 (en) Graph sampling and random walk acceleration method and system on GPU
CN108182115B (zh) 一种云环境下的虚拟机负载均衡方法
Shi et al. A quantitative survey of communication optimizations in distributed deep learning
CN110659278A (zh) 基于cpu-gpu异构架构的图数据分布式处理系统
WO2020211717A1 (zh) 一种数据处理方法、装置及设备
Kunz et al. Multi-level parallelism for time-and cost-efficient parallel discrete event simulation on gpus
CN110413776A (zh) 一种基于cpu-gpu协同并行的文本主题模型lda高性能计算方法
Cheng A high efficient task scheduling algorithm based on heterogeneous multi-core processor
CN110119317B (zh) 一种基于遗传算法的云计算任务调度方法和系统
CN110502337B (zh) 针对Hadoop MapReduce中混洗阶段的优化系统
Li et al. A parallel immune algorithm based on fine-grained model with GPU-acceleration
Baudis et al. Performance evaluation of priority queues for fine-grained parallel tasks on GPUs
Almasri et al. Hykernel: A hybrid selection of one/two-phase kernels for triangle counting on gpus
CN109949202A (zh) 一种并行的图计算加速器结构
CN113010316B (zh) 一种基于云计算的多目标群智能算法并行优化方法
Wang et al. Energy-efficient task scheduling algorithms with human intelligence based task shuffling and task relocation
Falcao et al. Heterogeneous implementation of a voronoi cell-based svp solver
CN116188239B (zh) 多请求并发的gpu图随机游走优化实现方法及系统
Ganesh et al. A Hyper Heuristics Technique for Data Partitioning and Scheduling to Heterogeneous Systems Using Genetic Algorithm and Improved Particle Swarm Optimisation
Li et al. A hybrid genetic algorithm for task scheduling in internet of things
Kun-lun et al. Improved GEP Algorithm for Task Scheduling in Cloud Computing
Li et al. Research of Data Mining Algorithms Based on Hadoop Cloud Platform
Jingmei et al. A CMP thread scheduling strategy based on improved firework algorithm
Tian Based on Hybrid Particle Swarm Optimization Algorithm Respectively Research on Multiprocessor Task Scheduling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932331

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932331

Country of ref document: EP

Kind code of ref document: A1