CN104965689A - Hybrid parallel computing method and device for CPUs/GPUs - Google Patents

Hybrid parallel computing method and device for CPUs/GPUs Download PDF

Info

Publication number
CN104965689A
CN104965689A CN201510264320.2A CN201510264320A CN104965689A CN 104965689 A CN104965689 A CN 104965689A CN 201510264320 A CN201510264320 A CN 201510264320A CN 104965689 A CN104965689 A CN 104965689A
Authority
CN
China
Prior art keywords
task
waiting
gpu
computing node
waiting task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510264320.2A
Other languages
Chinese (zh)
Inventor
李清玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201510264320.2A priority Critical patent/CN104965689A/en
Publication of CN104965689A publication Critical patent/CN104965689A/en
Pending legal-status Critical Current

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a hybrid parallel computing method and a device for CPUs/GPUs. The method comprises following steps of utilizing more than one computing nodes to establish a computing cluster and determining a scheduling policy based on the fact that each computing node comprises a CPU and a GPU; acquiring more than one waiting task; caching more than one acquired waiting task to a task queue; scheduling more than one waiting task to more than one computing node in the task queue; pre-processing scheduled waiting tasks one by one by the CPUs in computing nodes scheduled with waiting tasks and mapping pre-processed tasks to video memory of the GPUs every time when one task is pre-processed; computing tasks mapped to video memory by the GPUs and returning computed results. According to the scheme, computing efficiency of the computing nodes is increased.

Description

The hybrid parallel computing method of a kind of CPU/GPU and device
Technical field
The present invention relates to field of computer technology, particularly the hybrid parallel computing method of a kind of CPU/GPU and device.
Background technology
Along with developing rapidly of computer technology, the process rank of data is also increasing.In order to solve day by day urgent large data processing problem, propose MapReduce programming model at present, MapReduce is a kind of distributed programmed model, can easily massive data sets be distributed on each node of computing cluster, to make multiple node co-treatment, thus realize the fast processing of large data.
In order to improve the calculated performance of MapReduce further, academia and industry member have all made much relevant research to this.Monokaryon GPU (Graphics Processing Unit, graphic process unit) appearance bring huge effect to the performance boost of system, wherein, up to a hundred stream process cores are comprised in GPU, its calculated performance has exceeded TFlops rank per second, be equivalent to a HPCC, thus the quick calculating of mass data can be realized.
But the MapReduce programming model utilizing monokaryon GPU to realize, causes counting yield lower.
Summary of the invention
In view of this, the invention provides hybrid parallel computing method and the device of a kind of CPU/GPU, to solve the lower problem of counting yield in prior art.
Embodiments provide the hybrid parallel computing method of a kind of CPU/GPU, utilize more than one computing node to set up computing cluster, each computing node comprises CPU and GPU, determines scheduling strategy; Also comprise:
Obtain more than one waiting task;
The described more than one waiting task obtained is cached in task queue;
According to described scheduling strategy, the described more than one waiting task in described task queue is dispatched to more than one computing node;
In the computing node being scheduled waiting task, CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU;
GPU calculates being mapped in video memory of task, and returns result of calculation.
Preferably, described described more than one waiting task in described task queue is dispatched to more than one computing node before, comprise further:
Travel through more than one waiting task described in described task queue; The operational attribute of the waiting task of current traversal is obtained and record when often traveling through a waiting task; After described task queue traversal being terminated, the waiting task with same operation attribute is merged into same task; And the task after being combined is divided into groups, and according to the task creation hash index district after grouping, so that the task after grouping is kept in described hash index district.
Preferably, described GPU calculates being mapped in video memory of task, comprising:
Be more than one task block by the division of tasks be mapped in video memory, and distribute corresponding Map task for each task block, and by Map task matching corresponding for each task block on each SM processor of GPU, perform Map operation to utilize each SM processor for each task block;
Operated by middle task inverted orientation in GPU video memory by Shuffle, and gather the operation result in Map stage in the Reduce stage.
Preferably,
Comprise further: pre-set Access Control List (ACL), described Access Control List (ACL) comprises task and has the corresponding relation of the user operating this task right;
Before the more than one waiting task of described acquisition, comprise further: determine whether the user of submission waiting task has the operating right to this waiting task according to described Access Control List (ACL), when having, perform the operation obtaining this waiting task.
The embodiment of the present invention additionally provides the hybrid parallel calculation element of a kind of CPU/GPU, and utilize more than one computing node to set up computing cluster, each computing node comprises CPU and GPU, determines scheduling strategy, comprising:
Task buffer module, for obtaining more than one waiting task, and is cached in task queue by the described more than one waiting task obtained;
Task scheduling modules, for according to described scheduling strategy, dispatches to more than one computing node by the described more than one waiting task in described task queue;
Computing node, for utilize when being scheduled waiting task CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU; Utilize GPU to calculate being mapped in video memory of task, and return result of calculation.
Preferably,
Described task buffer module, for traveling through more than one waiting task described in described task queue; The operational attribute of the waiting task of current traversal is obtained and record when often traveling through a waiting task; After described task queue traversal being terminated, the waiting task with same operation attribute is merged into same task; And the task after being combined is divided into groups, and according to the task creation hash index district after grouping, so that the task after grouping is kept in described hash index district.
Preferably, described computing node, division of tasks for being mapped in video memory is more than one task block, and distribute corresponding Map task for each task block, and by Map task matching corresponding for each task block on each SM processor of GPU, perform Map operation to utilize each SM processor for each task block; Operated by middle task inverted orientation in GPU video memory by Shuffle, and gather the operation result in Map stage in the Reduce stage.
Preferably, comprise further:
Security module, for determining according to the Access Control List (ACL) pre-set whether the user of submission waiting task has the operating right to this waiting task, when having, perform the operation obtaining this waiting task, described Access Control List (ACL) comprises task and has the corresponding relation of the user operating this task right.
Embodiments provide hybrid parallel computing method and the device of a kind of CPU/GPU, combine by utilizing CPU and GPU, by CPU, pre-service is carried out to waiting task, by GPU, pretreated task is calculated, carry out in the process calculated at GPU, CPU can continue to obtain waiting task and carry out pre-service, thus achieves the parallel computation of CPU and GPU, not only dilatation is achieved to the computing power of GPU, also improve the counting yield of computing node.
Accompanying drawing explanation
Fig. 1 is the method flow diagram that the embodiment of the present invention provides;
Fig. 2 is the method flow diagram that another embodiment of the present invention provides;
Fig. 3 is the apparatus structure schematic diagram that the embodiment of the present invention provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Embodiments provide the hybrid parallel computing method of a kind of CPU/GPU, refer to Fig. 1, utilize more than one computing node to set up cluster, each computing node comprises CPU and GPU, determines scheduling strategy; The method can comprise the following steps:
Step 101: obtain more than one waiting task.
Step 102: the more than one waiting task obtained is cached in task queue.
Step 103: according to scheduling strategy, dispatches the more than one waiting task in task queue to more than one computing node.
Step 104: in the computing node being scheduled waiting task, CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU.
Step 105:GPU calculates being mapped in video memory of task, and returns result of calculation.
According to this programme, combine by utilizing CPU and GPU, by CPU, pre-service is carried out to waiting task, by GPU, pretreated task is calculated, carry out in the process calculated at GPU, CPU can continue to obtain waiting task and carry out pre-service, thus achieves the parallel computation of CPU and GPU, not only dilatation is achieved to the computing power of GPU, also improve the counting yield of computing node.
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
Embodiments provide the hybrid parallel computing method of a kind of CPU/GPU, refer to Fig. 2, the method can comprise the following steps:
201: utilize more than one computing node to set up computing cluster, each computing node comprises CPU and GPU.
In the present embodiment, because the computational valid time rate that monokaryon GPU or monokaryon CPU is carrying out mass data is lower, therefore, can consider that the hybrid parallel utilizing CPU and GPU in computing node calculates, to improve the counting yield of mass data.
Refer to Fig. 2, for the MapReduce hybrid parallel computing cluster of the CPU/GPU that the present embodiment provides, in this cluster, comprise task buffer module, task scheduling modules, security module and more than one computing node, i.e. computing node 1, computing node 2 ... computing node N, wherein, CPU, GPU, internal memory and local disk has been included in each computing node.
202: user submits more than one waiting task to hybrid parallel computing cluster.
In the present embodiment, security module pre-sets ACL (Access Control List, Access Control List (ACL)), task can be comprised in this ACL, and there is the corresponding relation of the user operating this task right, security module according to the setting in ACL, can allow submitting the user of waiting task to or limit.Such as, the waiting task that user A submits to is reading task a, if security module is determined not comprise user A in the user of the read operation authority with task a in ACL, so security module can send the prompting of refusal reading task a to this user A; If security module determines that in ACL user A has the operating right for reading task a, so security module allows the waiting task that user A submits to.
In a preferred embodiment of the invention, security module can also arrange and allow or forbid that user checks and revise the task that other users submit to.There is provided communication security defencive function by security module, ensure user and the security communicated between modules in hybrid parallel computing cluster, prevent the leakage of sensitive data.
203: the more than one waiting task that user submits to, when determining the waiting task that security module allows user to submit to, is cached in task queue by task buffer module.
In the present embodiment, task buffer module is increased to make full use of the network bandwidth and to improve system effectiveness.To the task that user submits to, first task is kept in task queue.Task queue, except preservation task, can also travel through more than one waiting task in task queue; The operational attribute of the waiting task of current traversal is obtained and record when often traveling through a waiting task; After task queue traversal being terminated, dynamically merging similarity or same task submit to adapt to user the scene that there is a large amount of similarity or repetitive task to.Wherein, similarity or same task can be the similar or identical tasks of operational attribute, such as, are all read operations, or are all that reading task a operates, and are all write operations, are all accessing operation or are all opening operation etc.
In the present embodiment, can also for the task creation hash index district after merging, and the task after being combined is divided into groups, and according to the task creation hash index district after grouping, so that the task after grouping is kept in described hash index district.Further, can also buffer memory task compression design be passed through, the multiple subtasks boil down to waiting task comprised by each waiting task, thus can calculated load be reduced, improve the transfer efficiency of bandwidth.
At service incoming end, increase task buffer module to make full use of the network bandwidth and to improve system effectiveness.To the task that user submits to, first task is kept in task queue.Task queue, except preservation task, also dynamically merges similarity task and submits to adapt to user the scene that there is a large amount of similarity or repetitive task to.By setting up hash index district, grouping similarity task; By buffer memory task compression design, reduce calculated load, improve the transfer efficiency of bandwidth.
204: more than one waiting task, according to task scheduling strategy, is dispatched to more than one computing node by task scheduling modules respectively.
In the present embodiment, task scheduling modules can be considered to dispatch more than one waiting task according to the various aspects such as present load, task rank of internal memory surplus, task amount size, each computing node, and the task scheduling strategy of the present embodiment can be any strategy of the prior art.
Such as, more than one waiting task is dispatched to computing node 1, computing node 2, computing node N by task scheduling modules respectively, refers to Fig. 2.
Wherein, the task scheduling modules of Effect-based operation mechanism supports real-time task scheduling strategy, and the computational resource dynamically calculating every node realizes the optimal scheduling of task; Copy MapReduce framework, a series of external interface is provided, facilitate developer to realize fast and application deployment; Use Hadoop basic communication protocol, strengthen extensibility on the whole.
205: in the computing node being scheduled waiting task, CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU.
In the present embodiment, CPU carries out pre-service to waiting task, for by required data-mapping to GPU video memory.
206:GPU calculates being mapped in video memory of task, and returns result of calculation.
In the present embodiment, GPU utilizes MapReduce computation module to calculate as follows being mapped in video memory of task:
A: be more than one task block by the division of tasks be mapped in video memory, and distribute corresponding Map task for each task block, and by Map task matching corresponding for each task block on each SM processor of GPU, perform Map operation to utilize each SM processor for each task block;
Wherein, each SM processor is as follows for each task block execution Map operation:
First, in Hadoop, a set of JAVA comment code (being similar to the comment code in OpenMP) is added in design, and shape is as " // #gmp parallel for ".This cover comment code is used in Map function, marks the code wishing to run on GPU for programmer.
Then, the source code of compiling containing JAVA comment code, obtains the Java bytecode containing comment code.
Afterwards, the java class loader that design one is new on the basis of traditional java class loader, called after GPUClass loader.GPU Class loader can identify part (namely needing the part run on GPU) annotated in Java bytecode, is deployed on each computing node by GPU Class loader.
Then, GPU Class loader detects local computing environments automatically, check whether computing environment can be used, wherein, this computing environment can be CUDA (Compute Unified Device Architecture, unified calculation equipment framework), if unavailable, then directly to calculate on CPU; If available, then detect the concrete version of CUDA, and identify code section (namely needing the part run on GPU) annotated in java class loader.
GPU Class loader, for part annotated in the Java bytecode identified, generates corresponding CUDA code, comprises one section of power function code and one section of run time version, and compiles this two sections of codes.Call the CUDA code after compiling by the mode of JNI, related data is copied on GPU video memory, and CUDA code runs on GPU.When adopting JNI, only have when this section of code meets certain independence condition, and GPU resource in computing environment can time, just successfully can call, otherwise send miscue.
After GPU calculates and terminates, the operation result of CUDA code is copied back local main memory, and Map function obtains these operation results.The code section be not labeled in Map function runs on CPU.
Afterwards, scheduling node follows the tracks of the running status of all Map tasks, and rerun for the failed Map task of operation, until all Map tasks complete, Map process terminates.
B: operate by middle task inverted orientation in GPU video memory by Shuffle, for ensuing Reduce operation provides second fruiting.
C: and the operation result in Map stage is gathered in the Reduce stage, the operation result in the Map stage of gathering is turned back in internal memory with stream data form, and has CPU scheduling instruction to output in network I/O.
206: result of calculation is returned to client by computing cluster.
Embodiments provide the hybrid parallel calculation element of a kind of CPU/GPU, refer to Fig. 3, utilize more than one computing node to set up computing cluster, each computing node comprises CPU and GPU, determines scheduling strategy, comprising:
Task buffer module 301, for obtaining more than one waiting task, and is cached in task queue by the described more than one waiting task obtained;
Task scheduling modules 302, for according to described scheduling strategy, dispatches to more than one computing node by the described more than one waiting task in described task queue;
Computing node 303, for utilize when being scheduled waiting task CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU; Utilize GPU to calculate being mapped in video memory of task, and return result of calculation.
Further,
Described task buffer module 301, for traveling through more than one waiting task described in described task queue; The operational attribute of the waiting task of current traversal is obtained and record when often traveling through a waiting task; After described task queue traversal being terminated, the waiting task with same operation attribute is merged into same task; And the task after being combined is divided into groups, and according to the task creation hash index district after grouping, so that the task after grouping is kept in described hash index district.
Further, described computing node 303, division of tasks for being mapped in video memory is more than one task block, and distribute corresponding Map task for each task block, and by Map task matching corresponding for each task block on each SM processor of GPU, perform Map operation to utilize each SM processor for each task block; Operated by middle task inverted orientation in GPU video memory by Shuffle, and gather the operation result in Map stage in the Reduce stage.
Comprise further:
Security module 304, for determining according to the Access Control List (ACL) pre-set whether the user of submission waiting task has the operating right to this waiting task, when having, perform the operation obtaining this waiting task, described Access Control List (ACL) comprises task and has the corresponding relation of the user operating this task right.
To sum up, the embodiment of the present invention at least can realize following beneficial effect:
1, by utilizing CPU and GPU to combine, by CPU, pre-service is carried out to waiting task, by GPU, pretreated task is calculated, carry out in the process calculated at GPU, CPU can continue to obtain waiting task and carry out pre-service, thus achieve the parallel computation of CPU and GPU, not only dilatation is achieved to the computing power of GPU, also improve the counting yield of computing node.
2, by increasing task buffer module to make full use of the network bandwidth and to improve system effectiveness.To the task that user submits to, first task is kept in task queue.Task queue, except preservation task, also dynamically merges similarity task and submits to adapt to user the scene that there is a large amount of similarity or repetitive task to.By setting up hash index district, grouping similarity task; By buffer memory task compression design, reduce calculated load, improve the transfer efficiency of bandwidth.
3, integratedly the unit MapReduce Computational frame Mars of GPU is only supported with transformation, make CPU pre-service that desired data is mapped to GPU video memory, by carrying out dilatation to the computing power of single node, realize the MapReduce hybrid parallel Computational frame supporting GPU/CPU.This framework combines the advantage of the scheduling of CPU complexity and GPU parallel computation, is applicable to compute-intensive applications, utilizes the advantage of extensive GPU cluster to significantly improve calculated performance and the counting yield of MapReduce, and the transparent exploitation realizing Parallel application efficiently realizes.
The content such as information interaction, implementation between each unit in the said equipment, due to the inventive method embodiment based on same design, particular content can see in the inventive method embodiment describe, repeat no more herein.
It should be noted that, in this article, the relational terms of such as first and second and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element " being comprised " limited by statement, and be not precluded within process, method, article or the equipment comprising described key element and also there is other same factor.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in the storage medium of embodied on computer readable, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium in.
Finally it should be noted that: the foregoing is only preferred embodiment of the present invention, only for illustration of technical scheme of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (8)

1. hybrid parallel computing method of CPU/GPU, is characterized in that, utilize more than one computing node to set up computing cluster, each computing node comprises CPU and GPU, determines scheduling strategy; Also comprise:
Obtain more than one waiting task;
The described more than one waiting task obtained is cached in task queue;
According to described scheduling strategy, the described more than one waiting task in described task queue is dispatched to more than one computing node;
In the computing node being scheduled waiting task, CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU;
GPU calculates being mapped in video memory of task, and returns result of calculation.
2. method according to claim 1, is characterized in that, described described more than one waiting task in described task queue is dispatched to more than one computing node before, comprise further:
Travel through more than one waiting task described in described task queue; The operational attribute of the waiting task of current traversal is obtained and record when often traveling through a waiting task; After described task queue traversal being terminated, the waiting task with same operation attribute is merged into same task; And the task after being combined is divided into groups, and according to the task creation hash index district after grouping, so that the task after grouping is kept in described hash index district.
3. method according to claim 1, is characterized in that, described GPU calculates being mapped in video memory of task, comprising:
Be more than one task block by the division of tasks be mapped in video memory, and distribute corresponding Map task for each task block, and by Map task matching corresponding for each task block on each SM processor of GPU, perform Map operation to utilize each SM processor for each task block;
Operated by middle task inverted orientation in GPU video memory by Shuffle, and gather the operation result in Map stage in the Reduce stage.
4., according to described method arbitrary in claim 1-3, it is characterized in that,
Comprise further: pre-set Access Control List (ACL), described Access Control List (ACL) comprises task and has the corresponding relation of the user operating this task right;
Before the more than one waiting task of described acquisition, comprise further: determine whether the user of submission waiting task has the operating right to this waiting task according to described Access Control List (ACL), when having, perform the operation obtaining this waiting task.
5. a hybrid parallel calculation element of CPU/GPU, is characterized in that, utilize more than one computing node to set up computing cluster, each computing node comprises CPU and GPU, determines scheduling strategy, comprising:
Task buffer module, for obtaining more than one waiting task, and is cached in task queue by the described more than one waiting task obtained;
Task scheduling modules, for according to described scheduling strategy, dispatches to more than one computing node by the described more than one waiting task in described task queue;
Computing node, for utilize when being scheduled waiting task CPU to scheduling waiting task carry out pre-service one by one, and the complete task of every pre-service then by the duty mapping after pre-service in the video memory of GPU; Utilize GPU to calculate being mapped in video memory of task, and return result of calculation.
6. device according to claim 5, is characterized in that,
Described task buffer module, for traveling through more than one waiting task described in described task queue; The operational attribute of the waiting task of current traversal is obtained and record when often traveling through a waiting task; After described task queue traversal being terminated, the waiting task with same operation attribute is merged into same task; And the task after being combined is divided into groups, and according to the task creation hash index district after grouping, so that the task after grouping is kept in described hash index district.
7. device according to claim 5, it is characterized in that, described computing node, division of tasks for being mapped in video memory is more than one task block, and distribute corresponding Map task for each task block, and by Map task matching corresponding for each task block on each SM processor of GPU, perform Map operation to utilize each SM processor for each task block; Operated by middle task inverted orientation in GPU video memory by Shuffle, and gather the operation result in Map stage in the Reduce stage.
8., according to described device arbitrary in claim 5-7, it is characterized in that, comprise further:
Security module, for determining according to the Access Control List (ACL) pre-set whether the user of submission waiting task has the operating right to this waiting task, when having, perform the operation obtaining this waiting task, described Access Control List (ACL) comprises task and has the corresponding relation of the user operating this task right.
CN201510264320.2A 2015-05-22 2015-05-22 Hybrid parallel computing method and device for CPUs/GPUs Pending CN104965689A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510264320.2A CN104965689A (en) 2015-05-22 2015-05-22 Hybrid parallel computing method and device for CPUs/GPUs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510264320.2A CN104965689A (en) 2015-05-22 2015-05-22 Hybrid parallel computing method and device for CPUs/GPUs

Publications (1)

Publication Number Publication Date
CN104965689A true CN104965689A (en) 2015-10-07

Family

ID=54219724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510264320.2A Pending CN104965689A (en) 2015-05-22 2015-05-22 Hybrid parallel computing method and device for CPUs/GPUs

Country Status (1)

Country Link
CN (1) CN104965689A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506266A (en) * 2016-11-01 2017-03-15 中国人民解放军91655部队 Network flow analysis method based on GPU, Hadoop/Spark mixing Computational frame
CN107145216A (en) * 2017-05-05 2017-09-08 北京景行锐创软件有限公司 A kind of dispatching method
CN107168796A (en) * 2017-05-12 2017-09-15 郑州云海信息技术有限公司 A kind of data merging method, device, memory and storage control
CN107391429A (en) * 2017-08-07 2017-11-24 胡明建 A kind of CPU+GPU+FPGA design method
CN108139931A (en) * 2015-10-16 2018-06-08 高通股份有限公司 It synchronizes to accelerate task subgraph by remapping
CN108596824A (en) * 2018-03-21 2018-09-28 华中科技大学 A kind of method and system optimizing rich metadata management based on GPU
CN109343958A (en) * 2018-09-25 2019-02-15 江苏满运软件科技有限公司 Computational resource allocation method, apparatus, electronic equipment, storage medium
CN109933429A (en) * 2019-03-05 2019-06-25 北京达佳互联信息技术有限公司 Data processing method, device, electronic equipment and storage medium
CN110222410A (en) * 2019-05-30 2019-09-10 北京理工大学 A kind of electromagnetic environment emulation method based on Hadoop MapReduce
CN110471702A (en) * 2019-06-27 2019-11-19 口碑(上海)信息技术有限公司 Task processing method, device, storage medium and computer equipment
CN110955527A (en) * 2019-12-17 2020-04-03 湖南大学 Method and system for realizing parallel task scheduling based on CPU (Central processing Unit) core number prediction
CN111158898A (en) * 2019-11-25 2020-05-15 国网浙江省电力有限公司建设分公司 BIM data processing method and device aiming at power transmission and transformation project site arrangement standardization
CN112465129A (en) * 2019-09-09 2021-03-09 上海登临科技有限公司 On-chip heterogeneous artificial intelligence processor
CN114116238A (en) * 2022-01-28 2022-03-01 深圳市迈科龙电子有限公司 Data processing optimization method and device, electronic equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063336A (en) * 2011-01-12 2011-05-18 国网电力科学研究院 Distributed computing multiple application function asynchronous concurrent scheduling method
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN102708088A (en) * 2012-05-08 2012-10-03 北京理工大学 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation
CN102722450A (en) * 2012-05-25 2012-10-10 清华大学 Storage method for redundancy deletion block device based on location-sensitive hash
CN103279385A (en) * 2013-06-01 2013-09-04 北京华胜天成科技股份有限公司 Method and system for scheduling cluster tasks in cloud computing environment
US20140333638A1 (en) * 2013-05-09 2014-11-13 Advanced Micro Devices, Inc. Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units
CN104536937A (en) * 2014-12-30 2015-04-22 深圳先进技术研究院 Big data appliance realizing method based on CPU-GPU heterogeneous cluster

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063336A (en) * 2011-01-12 2011-05-18 国网电力科学研究院 Distributed computing multiple application function asynchronous concurrent scheduling method
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN102708088A (en) * 2012-05-08 2012-10-03 北京理工大学 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation
CN102722450A (en) * 2012-05-25 2012-10-10 清华大学 Storage method for redundancy deletion block device based on location-sensitive hash
US20140333638A1 (en) * 2013-05-09 2014-11-13 Advanced Micro Devices, Inc. Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units
CN103279385A (en) * 2013-06-01 2013-09-04 北京华胜天成科技股份有限公司 Method and system for scheduling cluster tasks in cloud computing environment
CN104536937A (en) * 2014-12-30 2015-04-22 深圳先进技术研究院 Big data appliance realizing method based on CPU-GPU heterogeneous cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邢科: "基于GPU的索引构建方法研究", 《信息科技辑》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108139931A (en) * 2015-10-16 2018-06-08 高通股份有限公司 It synchronizes to accelerate task subgraph by remapping
CN106506266B (en) * 2016-11-01 2019-05-14 中国人民解放军91655部队 Network flow analysis method based on GPU, Hadoop/Spark mixing Computational frame
CN106506266A (en) * 2016-11-01 2017-03-15 中国人民解放军91655部队 Network flow analysis method based on GPU, Hadoop/Spark mixing Computational frame
CN107145216A (en) * 2017-05-05 2017-09-08 北京景行锐创软件有限公司 A kind of dispatching method
CN107168796A (en) * 2017-05-12 2017-09-15 郑州云海信息技术有限公司 A kind of data merging method, device, memory and storage control
CN107391429A (en) * 2017-08-07 2017-11-24 胡明建 A kind of CPU+GPU+FPGA design method
CN108596824A (en) * 2018-03-21 2018-09-28 华中科技大学 A kind of method and system optimizing rich metadata management based on GPU
CN109343958B (en) * 2018-09-25 2021-05-11 广州回头车信息科技有限公司 Computing resource allocation method and device, electronic equipment and storage medium
CN109343958A (en) * 2018-09-25 2019-02-15 江苏满运软件科技有限公司 Computational resource allocation method, apparatus, electronic equipment, storage medium
CN109933429A (en) * 2019-03-05 2019-06-25 北京达佳互联信息技术有限公司 Data processing method, device, electronic equipment and storage medium
CN110222410A (en) * 2019-05-30 2019-09-10 北京理工大学 A kind of electromagnetic environment emulation method based on Hadoop MapReduce
CN110471702A (en) * 2019-06-27 2019-11-19 口碑(上海)信息技术有限公司 Task processing method, device, storage medium and computer equipment
CN110471702B (en) * 2019-06-27 2021-11-02 口碑(上海)信息技术有限公司 Task processing method and device, storage medium and computer equipment
CN112465129A (en) * 2019-09-09 2021-03-09 上海登临科技有限公司 On-chip heterogeneous artificial intelligence processor
CN112465129B (en) * 2019-09-09 2024-01-09 上海登临科技有限公司 On-chip heterogeneous artificial intelligent processor
CN111158898A (en) * 2019-11-25 2020-05-15 国网浙江省电力有限公司建设分公司 BIM data processing method and device aiming at power transmission and transformation project site arrangement standardization
CN111158898B (en) * 2019-11-25 2022-07-15 国网浙江省电力有限公司建设分公司 BIM data processing method and device aiming at power transmission and transformation project site arrangement standardization
CN110955527A (en) * 2019-12-17 2020-04-03 湖南大学 Method and system for realizing parallel task scheduling based on CPU (Central processing Unit) core number prediction
CN110955527B (en) * 2019-12-17 2022-05-10 湖南大学 Method and system for realizing parallel task scheduling based on CPU (Central processing Unit) core number prediction
CN114116238A (en) * 2022-01-28 2022-03-01 深圳市迈科龙电子有限公司 Data processing optimization method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN104965689A (en) Hybrid parallel computing method and device for CPUs/GPUs
Barika et al. Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions
CN106687918B (en) Compiling graph-based program specifications
US9122513B2 (en) Method and apparatus for efficient execution of concurrent processes on a multithreaded message passing system
US8239847B2 (en) General distributed reduction for data parallel computing
CN106687920B (en) Management task invocation
US20070038987A1 (en) Preprocessor to improve the performance of message-passing-based parallel programs on virtualized multi-core processors
CN106687919B (en) Method, system, and computer-readable medium for controlling execution of a plurality of components
US20130232495A1 (en) Scheduling accelerator tasks on accelerators using graphs
Son et al. Enabling active storage on parallel I/O software stacks
US20140280441A1 (en) Data integration on retargetable engines in a networked environment
US9684493B2 (en) R-language integration with a declarative machine learning language
Thoman et al. Celerity: High-level c++ for accelerator clusters
CN106605209B (en) Controlling data processing tasks
CN104536937A (en) Big data appliance realizing method based on CPU-GPU heterogeneous cluster
US9448851B2 (en) Smarter big data processing using collaborative map reduce frameworks
CN104657149A (en) Software framework implementation method of management module of storage system
Gu et al. Improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms
CN109726004A (en) A kind of data processing method and device
CN106874067B (en) Parallel computing method, device and system based on lightweight virtual machine
US10268461B2 (en) Global data flow optimization for machine learning programs
US11042530B2 (en) Data processing with nullable schema information
Jain et al. Charm++ and MPI: Combining the best of both worlds
Sakr Big data processing stacks
Beynon et al. Performance optimization for data intensive grid applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151007

WD01 Invention patent application deemed withdrawn after publication