CN102708088A - CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation - Google Patents

CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation Download PDF

Info

Publication number
CN102708088A
CN102708088A CN2012101407459A CN201210140745A CN102708088A CN 102708088 A CN102708088 A CN 102708088A CN 2012101407459 A CN2012101407459 A CN 2012101407459A CN 201210140745 A CN201210140745 A CN 201210140745A CN 102708088 A CN102708088 A CN 102708088A
Authority
CN
China
Prior art keywords
gpu
code
cpu
node
mass data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101407459A
Other languages
Chinese (zh)
Inventor
翟岩龙
刘培志
罗壮
黄河燕
宿红毅
郭琨毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN2012101407459A priority Critical patent/CN102708088A/en
Publication of CN102708088A publication Critical patent/CN102708088A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation, which is used for solving the problem of lower operating efficiency of mass data computation. By designing a set of JAVA comment code criterion and building a computer cluster composed of multiple computers, an improved Hadoop platform is arranged in the cluster, and the designed JAVA comment code criterion and a GPU Class loader are added in the improved platform; a CUDA (Compute Unified Device Architecture) of a certain edition is mounted on each computing node, so that the user can conveniently use the CPU computation resource in a Map function of a MapReduce through the commend code when programming. The method realizes unified dispatch and utilization of CPU and GPU computing power on the computer cluster, so that the application having both data-intensive property and computation-intensive property can be realized efficiently, and the programmed source code is transplantable, and convenient for the programmer to develop.

Description

CPU/GPU cooperative processing method towards mass data high performance calculating
Technical field
The present invention relates to a kind of use and set up the method for the collaborative computing platform of CPU/GPU, belong to mass data processing and high-performance calculation processing technology field.
Background technology
In current computer realm, there are many application all need carry out mass data processing.At present, be the MapReduce computation model by the mass data processing method of extensive employing.MapReduce is a kind of programming model of realizing the distributed parallel calculation task that Google proposes.It can be distributed in mass data carries out parallel processing on the large-scale cluster.The MapReduce programming model is divided into Map stage and Reduce stage with computation process.Its principle is that data are the piecemeal of specific size by cutting, adopt form distributed storage in cluster of < Key, Value >.Each node in the cluster all has some Map and Reduce task.< Key, Value>that the Map task will be imported handles, and generates other < Key, Value>then; The Reduce task will have identical Key's < Key, Value>data and focus on.MapReduce handles mass data through this simple model exactly.But; Having one type of very important mass data processing to use but is difficult to solve with the MapReduce computation model; This type application has data-intensive (Data-intensive) and two characteristics of computation-intensive (Computational-intensive) simultaneously, such as energy exploration industry data imaging.Energy industry PetroChina Company Limited. gas industry is excessive risk, high investment and hi-tech industry, its upstream business, i.e. and the development of oil-gas exploration and development highly depends on the integrated application of various new and high technologies, particularly infotech.Wherein, imaging and modeling technique are the Core Features of oil-gas exploration software systems.The seismic exploration data processing is high capacity (number PB) and highdensity calculating, is that survey data is analyzed bottleneck maximum in the whole flow process all the time.Data imaging with 1TB is an example, if only adopt high-performance CPU cluster to calculate, needs the time of several weeks.The arithmetic capability of each node can't satisfy these and is applied in the demand on the computation-intensive under the MapReduce model.
Along with making this difficult problem of solution, the technological development of GPU becomes possibility.GPU is a graphic process unit, but the GPU of today no longer has been confined to the 3D graphics process.In Floating-point Computation, parallel computation etc. partly aspect the calculating, and even GPU can provide the performance of tens of times of hundreds of times in CPU.Current, calculate industry and develop to " associated treatment " of CPU and GPU and usefulness from " central authorities handle " of only using CPU.For making this brand-new calculating model; NVIDIA (tall and handsome reaching) has invented CUDA (Compute Unified Device Architecture; Unified calculation equipment framework) this programming model can make full use of CPU and GPU advantage separately in application program.CUDA is a complete GPU solution, the direct access interface of hardware is provided, and needn't must have relied on the visit that the figure api interface is realized GPU as traditional approach.The hardware resource that on framework, has adopted a kind of brand-new counting system structure to use GPU to provide, thus use for large-scale data computation to provide a kind of than CPU powerful computing ability more.CUDA adopts the C language as programming language a large amount of high-performance calculation instruction development abilities to be provided, and makes the developer on the basis of the powerful calculating ability of GPU, set up the higher density data of a kind of efficient and calculates solution.
If can promptly design the collaborative disposal route of calculating of a kind of CPU/GPU with the function expansion of MapReduce to the calling of GPU, realized efficiently just can make those have data-intensive application concurrently with the computation-intensive characteristics.
One Chinese patent application (200910020566.X) " construction method of a kind of GPU and CPU combined processor " has proposed a kind of pin CPU has been set up into the method for combined processor with the GPU coupling, thereby made CPU and GPU can carry out collaborative work.But what on the one hand this method was directed against is single computer, can't integrate CPU and GPU computational resource on the cluster that has a great amount of calculation machine; On the other hand, the CPU in this method can only be responsible for the general procedure task that operating system, system software and general purpose application program etc. have complicated order scheduling, circulation, branch, logic determines, and can't participate in the parallel computation processing of large-scale data.
Summary of the invention
The objective of the invention is in order to overcome the defective of prior art; Calculate the operational efficiency problem of lower for solving such as the mass data that is faced in the fields such as energy exploration industry data imaging, quick radar imagery and finance data analysis; Propose a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance, can be realized efficiently thereby make those have data-intensive application concurrently with the computation-intensive characteristics.
For realizing above-mentioned purpose, the technical scheme that the present invention adopted is following:
A kind of CPU/GPU cooperative processing method that calculates towards mass data high performance may further comprise the steps:
Step 1, set up a computer cluster, and the calculating and the storage resources of each node on the cluster are integrated.
Comprise a scheduling node on the said cluster, be responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node.
Said each node all has own independent CPUs, GPU, internal memory and local disk.On disk access, each node can only be visited local disk, can not visit the disk of other nodes.
Step 2, select for use CUDA, and be installed on each computing node of computer cluster, as the basis of using the GPU computational resource as the GPU computation model.
Step 3, employing MapReduce computation model, the primary control program on scheduling node is some task pieces with division of tasks, is that each task piece starts a Map task, and these Map Task Distribution are calculated to computing node.
Step 4, each computing node are carried out the Map process.Said Map process is following:
At first, design one cover Java comment code also is applied in the Map function, is used for the mark programmer and wants the code section that makes it to walk abreast, is similar to the comment code pattern among the OpenMP.For example " // #gmp parallelfor ", this comment code represent that promptly back to back circulation or function are that needs are parallel.
Then, compiling contains the source code of this kind comment code, obtains containing the Java bytecode of comment code.
Afterwards, new java class loader of design on the basis of traditional java class loader (class loader) is with its called after GPU Class loader.GPU Class loader can discern in the Java bytecode by the part of note, the part that promptly need on GPU, move.Simultaneously, GPU Class loader is deployed on each computing node.
Then, GPU Class loader detects local computing environment automatically, judges whether local GPU resource is available, if unavailable, then directly utilizes CPU to calculate; As if available, then record the concrete version of current C UDA, be adapted to the CUDA code of this version with generation.
Subsequently, GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code and the compiling.Said CUDA code comprises one section power function code and one section run time version.Call the CUDA code that compiles completion, make this partial code be implemented in the operation on the GPU, call the mode that can adopt JNI.When generating the CUDA code, have only when this section code to meet certain independence condition, but and the GPU computational resource time spent in the computing environment, just can accomplish, otherwise send miscue.
At this moment, obtain the operation result of GPU., do not accomplish up to operation by the normally operation on CPU of the code section of note.
At last, scheduling node reruns it for the Mapper of operation failure, thereby accomplishes the Map stage.
Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish whole computings.
Beneficial effect
The inventive method has the following advantages:
(1) this method has realized uniform dispatching, the utilization to CPU on the computer cluster and GPU computing power.Thereby make the application that has data-intensive and technology-intensive characteristics concurrently be able to realize efficiently.
(2) this platform can detect local GPU computing environment automatically, carries out corresponding selection, makes the source code that writes have portability.
(3) the present invention introduces the GPU resource with the mode of comment code, makes the programmer only need understand corresponding comment code and just can develop, and is easy to study.
Description of drawings
Fig. 1 is the schematic flow sheet of the inventive method;
Fig. 2 is the inventive method implementation framework synoptic diagram;
Fig. 3 is the implementation procedure synoptic diagram that utilizes the GPU computing power.
Embodiment
Do further explain below in conjunction with the accompanying drawing specific embodiments of the invention.
A kind of CPU/GPU cooperative processing method that calculates towards mass data high performance; Its ultimate principle is: at first design a cover JAVA comment code standard; On the basis of traditional java class loader one of design new, can discern in the Java bytecode by the java class loader of comment section, with its called after GPU Class loader.Through building a computer cluster of forming by many computing machines, and in cluster, dispose the Hadoop platform after improving.Be added into the Java comment code standard and the GPU Classloader that design in the platform after the improvement.On each computing node, install the CUDA of a certain version, thereby make the user when coding, can in the Map of MapReduce function, be convenient to use the GPU computational resource through comment code.Like Fig. 2, shown in Figure 3.
The concrete performing step of the inventive method is as shown in Figure 1, specific as follows:
Step 1, set up a computer cluster, and the calculating and the storage resources of each node on the cluster are integrated.Wherein, comprise a scheduling node on the cluster, be responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node.Each node all has own independent CPUs, GPU, internal memory and local disk.On disk access, each node can only be visited local disk, can not visit the disk of other nodes.
Step 2, select for use CUDA, and be installed on each computing node of computer cluster, as the basis of using the GPU computational resource as the GPU computation model.
Step 3, employing MapReduce computation model, the primary control program on the scheduling node is divided into some task pieces with task.
Hadoop being installed on each node of computer cluster and setting Map and the Reduce task quantity that can move simultaneously on HDFS data block size block and each node, be designated as m and r respectively, MapReduce can normally be moved in calculating cluster.
Simultaneously, the task scale is designated as K.Behind task run, scheduling node becomes K/block Map task by the HDFS data block that configures size block with division of tasks, and these Map tasks is thought to be assigned on each computing node calculate.When K/block is aliquant, can round up.Simultaneously, start r reduce task, the value of r is set by the user, is set at 1 here.
Step 4, each computing node are carried out the Map process, and concrete implementation procedure is following:
At first, a cover JAVA comment code (being similar to the comment code among the OpenMP) is added in design in Hadoop, and shape is like " // #gmp parallel for ".This cover comment code is used in the Map function, supplies programmer's mark to hope the code that on GPU, moves.
Then, compiling contains the source code of JAVA comment code, obtains containing the Java bytecode of comment code.
Afterwards, new java class loader of design on the basis of traditional java class loader, called after GPU Class loader.GPU Class loader can discern in the Java bytecode by the part of note (part that promptly need on GPU, move), and GPU Class loader is deployed on each computing node.
Then, GPU Class loader detects local computing environment automatically, and whether CUDA is available in inspection, if unavailable, then directly on CPU, calculates; If available, then detect the concrete version of CUDA, and discern in the java class loader by the code section of note (part that promptly need on GPU, move).
GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code, comprise one section power function code and one section run time version, and compile this two sections codes.Call the CUDA code after the compiling with the mode of JNI, related data is copied on the GPU video memory, and the CUDA code moves on GPU.When adopting JNI, have only when this section code to meet certain independence condition, and the GPU resource in the computing environment can the time, just can successfully call, otherwise send miscue.
After GPU calculate to finish, the operation result of the CUDA code local main memory that is copied back, the Map function obtains these operation results.The code section that is not labeled in the Map function moves on CPU.
Afterwards, scheduling node is followed the tracks of the running status of all Map tasks, and the Map task of failing for operation reruns, and accomplishes up to all Map tasks, and the Map process finishes.
Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish and calculate.

Claims (5)

1. a CPU/GPU cooperative processing method that calculates towards mass data high performance is characterized in that, may further comprise the steps:
Step 1, set up a computer cluster, integrate the calculating and the storage resources of each node on the cluster; A scheduling node is set on cluster, is responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node;
Step 2, select for use CUDA, be installed on each computing node of cluster, as the basis of using the GPU computational resource as the GPU computation model;
Step 3, employing MapReduce computation model, the primary control program on the scheduling node is some task pieces with division of tasks, is that each task piece starts a Map task, and these Map Task Distribution are calculated to computing node;
Step 4, each computing node are carried out the Map process;
Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish whole computings.
2. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 1 is characterized in that each node in the said cluster all has own independent CPUs, GPU, internal memory and local disk.
3. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 2 is characterized in that on disk access, each node can only be visited local disk, can not visit the disk of other nodes.
4. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 1 is characterized in that, each computing node execution Map process is following in the said step 4:
At first, design one is overlapped the Java comment code and is applied in the Map function;
Then, compiling contains the source code of this kind comment code, obtains containing the Java bytecode of comment code;
Afterwards, new java class loader of design on traditional java class loader class loader basis is with its called after GPU Class loader; Simultaneously, GPU Class loader is deployed on each computing node;
Then, GPU Class loader detects local computing environment automatically, judges whether local GPU resource is available, if unavailable, then directly utilizes CPU to calculate; As if available, then record the concrete version of current C UDA, be adapted to the CUDA code of this version with generation;
Subsequently, GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code and compile; Call the CUDA code that compiles completion, make this partial code be implemented in the operation on the GPU; When generating the CUDA code, have only when this section code to meet certain independence condition, but and the GPU computational resource time spent in the computing environment just can accomplish, otherwise send miscue;
At this moment, obtain the operation result of GPU; , do not accomplish up to operation by the normally operation on CPU of the code section of note;
At last, scheduling node reruns it for the Mapper of operation failure.
5. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 4 is characterized in that the CUDA code that is generated comprises one section power function code and one section run time version.
CN2012101407459A 2012-05-08 2012-05-08 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation Pending CN102708088A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101407459A CN102708088A (en) 2012-05-08 2012-05-08 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101407459A CN102708088A (en) 2012-05-08 2012-05-08 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation

Publications (1)

Publication Number Publication Date
CN102708088A true CN102708088A (en) 2012-10-03

Family

ID=46900884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101407459A Pending CN102708088A (en) 2012-05-08 2012-05-08 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation

Country Status (1)

Country Link
CN (1) CN102708088A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309645A (en) * 2013-04-27 2013-09-18 李朝波 Method of appending skip function in computer data processing instruction and CPU (Central Processing Unit) module
CN103324505A (en) * 2013-06-24 2013-09-25 曙光信息产业(北京)有限公司 Method for deploying GPU (graphic processor unit) development environments in cluster system and could computing system
CN103324538A (en) * 2013-05-23 2013-09-25 国家电网公司 Method for designing dislocated scattered cluster environment distributed concurrent processes
CN103399787A (en) * 2013-08-06 2013-11-20 北京华胜天成科技股份有限公司 Map Reduce task streaming scheduling method and scheduling system based on Hadoop cloud computing platform
CN103699656A (en) * 2013-12-27 2014-04-02 同济大学 GPU-based mass-multimedia-data-oriented MapReduce platform
CN104536937A (en) * 2014-12-30 2015-04-22 深圳先进技术研究院 Big data appliance realizing method based on CPU-GPU heterogeneous cluster
CN104570081A (en) * 2013-10-29 2015-04-29 中国石油化工股份有限公司 Pre-stack reverse time migration seismic data processing method and system by integral method
CN104679664A (en) * 2014-12-26 2015-06-03 浪潮(北京)电子信息产业有限公司 Communication method and device in cluster system
CN104731569A (en) * 2013-12-23 2015-06-24 华为技术有限公司 Data processing method and relevant equipment
CN104965689A (en) * 2015-05-22 2015-10-07 浪潮电子信息产业股份有限公司 Hybrid parallel computing method and device for CPUs/GPUs
CN105094981A (en) * 2014-05-23 2015-11-25 华为技术有限公司 Method and device for processing data
CN105227669A (en) * 2015-10-15 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study
CN106156810A (en) * 2015-04-26 2016-11-23 阿里巴巴集团控股有限公司 General-purpose machinery learning algorithm model training method, system and calculating node
CN106227509A (en) * 2016-06-30 2016-12-14 扬州大学 A kind of class towards Java code uses example to generate method
CN106936897A (en) * 2017-02-22 2017-07-07 上海网罗电子科技有限公司 A kind of high concurrent personnel positioning method for computing data based on GPU
CN107135257A (en) * 2017-04-28 2017-09-05 东方网力科技股份有限公司 Task is distributed in a kind of node cluster method, node and system
CN107241767A (en) * 2017-06-14 2017-10-10 广东工业大学 The method and device that a kind of mobile collaboration is calculated
CN109947563A (en) * 2019-03-06 2019-06-28 北京理工大学 A kind of parallel multilevel fast multipole tree construction compound storage method
CN110187970A (en) * 2019-05-30 2019-08-30 北京理工大学 A kind of distributed big data parallel calculating method based on Hadoop MapReduce
CN110569312A (en) * 2019-11-06 2019-12-13 创业慧康科技股份有限公司 big data rapid retrieval system based on GPU and use method thereof
CN111507466A (en) * 2019-01-30 2020-08-07 北京沃东天骏信息技术有限公司 Data processing method and device, electronic equipment and readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184254A1 (en) * 2007-01-25 2008-07-31 Bernard Guy S Systems, methods and apparatus for load balancing across computer nodes of heathcare imaging devices
US20110074791A1 (en) * 2009-09-30 2011-03-31 Greg Scantlen Gpgpu systems and services
CN102004670A (en) * 2009-12-17 2011-04-06 华中科技大学 Self-adaptive job scheduling method based on MapReduce

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184254A1 (en) * 2007-01-25 2008-07-31 Bernard Guy S Systems, methods and apparatus for load balancing across computer nodes of heathcare imaging devices
US20110074791A1 (en) * 2009-09-30 2011-03-31 Greg Scantlen Gpgpu systems and services
CN102004670A (en) * 2009-12-17 2011-04-06 华中科技大学 Self-adaptive job scheduling method based on MapReduce

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TOMI AARNIO: "Parallel data processing with MapReduce", 《TKK T-110.5190 SEMINAR ON INTERNETWORKING》, 27 April 2009 (2009-04-27) *
陈华平 等: "并行分布计算中的任务调度及其分类", 《计算机科学》, vol. 28, no. 1, 31 December 2001 (2001-12-31), pages 45 - 48 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309645B (en) * 2013-04-27 2015-09-16 李朝波 A kind of method of additional turn function in computer digital animation instruction and CPU module
CN103309645A (en) * 2013-04-27 2013-09-18 李朝波 Method of appending skip function in computer data processing instruction and CPU (Central Processing Unit) module
CN103324538A (en) * 2013-05-23 2013-09-25 国家电网公司 Method for designing dislocated scattered cluster environment distributed concurrent processes
CN103324538B (en) * 2013-05-23 2016-08-10 国家电网公司 A kind of method for designing of the dystopy dispersion distributed concurrent process of cluster environment
CN103324505A (en) * 2013-06-24 2013-09-25 曙光信息产业(北京)有限公司 Method for deploying GPU (graphic processor unit) development environments in cluster system and could computing system
CN103324505B (en) * 2013-06-24 2016-12-28 曙光信息产业(北京)有限公司 The method disposing GPU development environment in group system and cloud computing system
CN103399787A (en) * 2013-08-06 2013-11-20 北京华胜天成科技股份有限公司 Map Reduce task streaming scheduling method and scheduling system based on Hadoop cloud computing platform
CN103399787B (en) * 2013-08-06 2016-09-14 北京华胜天成科技股份有限公司 A kind of MapReduce operation streaming dispatching method and dispatching patcher calculating platform based on Hadoop cloud
CN104570081B (en) * 2013-10-29 2017-12-26 中国石油化工股份有限公司 A kind of integration method pre-stack time migration Processing Seismic Data and system
CN104570081A (en) * 2013-10-29 2015-04-29 中国石油化工股份有限公司 Pre-stack reverse time migration seismic data processing method and system by integral method
WO2015096649A1 (en) * 2013-12-23 2015-07-02 华为技术有限公司 Data processing method and related device
CN104731569A (en) * 2013-12-23 2015-06-24 华为技术有限公司 Data processing method and relevant equipment
CN104731569B (en) * 2013-12-23 2018-04-10 华为技术有限公司 A kind of data processing method and relevant device
CN103699656A (en) * 2013-12-27 2014-04-02 同济大学 GPU-based mass-multimedia-data-oriented MapReduce platform
CN105094981A (en) * 2014-05-23 2015-11-25 华为技术有限公司 Method and device for processing data
WO2015176689A1 (en) * 2014-05-23 2015-11-26 华为技术有限公司 Data processing method and device
CN105094981B (en) * 2014-05-23 2019-02-12 华为技术有限公司 A kind of method and device of data processing
CN104679664A (en) * 2014-12-26 2015-06-03 浪潮(北京)电子信息产业有限公司 Communication method and device in cluster system
CN104536937A (en) * 2014-12-30 2015-04-22 深圳先进技术研究院 Big data appliance realizing method based on CPU-GPU heterogeneous cluster
CN104536937B (en) * 2014-12-30 2017-10-31 深圳先进技术研究院 Big data all-in-one machine realization method based on CPU GPU isomeric groups
CN106156810B (en) * 2015-04-26 2019-12-03 阿里巴巴集团控股有限公司 General-purpose machinery learning algorithm model training method, system and calculate node
CN106156810A (en) * 2015-04-26 2016-11-23 阿里巴巴集团控股有限公司 General-purpose machinery learning algorithm model training method, system and calculating node
CN104965689A (en) * 2015-05-22 2015-10-07 浪潮电子信息产业股份有限公司 Hybrid parallel computing method and device for CPUs/GPUs
CN105227669A (en) * 2015-10-15 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study
CN106227509B (en) * 2016-06-30 2019-03-19 扬州大学 A kind of class towards Java code uses example generation method
CN106227509A (en) * 2016-06-30 2016-12-14 扬州大学 A kind of class towards Java code uses example to generate method
CN106936897A (en) * 2017-02-22 2017-07-07 上海网罗电子科技有限公司 A kind of high concurrent personnel positioning method for computing data based on GPU
CN107135257A (en) * 2017-04-28 2017-09-05 东方网力科技股份有限公司 Task is distributed in a kind of node cluster method, node and system
CN107241767A (en) * 2017-06-14 2017-10-10 广东工业大学 The method and device that a kind of mobile collaboration is calculated
CN107241767B (en) * 2017-06-14 2020-10-23 广东工业大学 Mobile collaborative computing method and device
CN111507466A (en) * 2019-01-30 2020-08-07 北京沃东天骏信息技术有限公司 Data processing method and device, electronic equipment and readable medium
CN109947563A (en) * 2019-03-06 2019-06-28 北京理工大学 A kind of parallel multilevel fast multipole tree construction compound storage method
CN109947563B (en) * 2019-03-06 2020-10-27 北京理工大学 Parallel multilayer rapid multi-polar subtree structure composite storage method
CN110187970A (en) * 2019-05-30 2019-08-30 北京理工大学 A kind of distributed big data parallel calculating method based on Hadoop MapReduce
CN110569312A (en) * 2019-11-06 2019-12-13 创业慧康科技股份有限公司 big data rapid retrieval system based on GPU and use method thereof

Similar Documents

Publication Publication Date Title
CN102708088A (en) CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation
Ma et al. Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}
Giorgi et al. TERAFLUX: Harnessing dataflow in next generation teradevices
Liu et al. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
Krieder et al. Design and evaluation of the gemtc framework for gpu-enabled many-task computing
Rauchwerger Run-time parallelization: Its time has come
Meng et al. Preliminary experiences with the uintah framework on intel xeon phi and stampede
Wang et al. A framework for distributed data-parallel execution in the Kepler scientific workflow system
Peterson et al. Demonstrating GPU code portability and scalability for radiative heat transfer computations
Carneiro Pessoa et al. GPU‐accelerated backtracking using CUDA Dynamic Parallelism
Palmskog et al. piCoq: Parallel regression proving for large-scale verification projects
Segal et al. High level programming for heterogeneous architectures
Pöppl et al. SWE-X10: Simulating shallow water waves with lazy activation of patches using ActorX10
Mivule et al. A review of cuda, mapreduce, and pthreads parallel computing models
Kunzman et al. Towards a framework for abstracting accelerators in parallel applications: experience with cell
Dubrulle et al. A low-overhead dedicated execution support for stream applications on shared-memory CMP
Davis et al. Paradigmatic shifts for exascale supercomputing
Andon et al. Programming high-performance parallel computations: formal models and graphics processing units
Liu et al. BSPCloud: A hybrid distributed-memory and shared-memory programming model
Cao et al. Evaluating data redistribution in parsec
Ali et al. A parallel programming model for Ada
Gainaru et al. Understanding the impact of data staging for coupled scientific workflows
Tarakji et al. Os support for load scheduling on accelerator-based heterogeneous systems
Weng et al. Acceleration of a Python-based tsunami modelling application via CUDA and OpenHMPP
Guaitero et al. Automatic Asynchronous Execution of Synchronously Offloaded OpenMP Target Regions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121003