CN102708088A - CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation - Google Patents
CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation Download PDFInfo
- Publication number
- CN102708088A CN102708088A CN2012101407459A CN201210140745A CN102708088A CN 102708088 A CN102708088 A CN 102708088A CN 2012101407459 A CN2012101407459 A CN 2012101407459A CN 201210140745 A CN201210140745 A CN 201210140745A CN 102708088 A CN102708088 A CN 102708088A
- Authority
- CN
- China
- Prior art keywords
- gpu
- code
- cpu
- node
- mass data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention provides a CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation, which is used for solving the problem of lower operating efficiency of mass data computation. By designing a set of JAVA comment code criterion and building a computer cluster composed of multiple computers, an improved Hadoop platform is arranged in the cluster, and the designed JAVA comment code criterion and a GPU Class loader are added in the improved platform; a CUDA (Compute Unified Device Architecture) of a certain edition is mounted on each computing node, so that the user can conveniently use the CPU computation resource in a Map function of a MapReduce through the commend code when programming. The method realizes unified dispatch and utilization of CPU and GPU computing power on the computer cluster, so that the application having both data-intensive property and computation-intensive property can be realized efficiently, and the programmed source code is transplantable, and convenient for the programmer to develop.
Description
Technical field
The present invention relates to a kind of use and set up the method for the collaborative computing platform of CPU/GPU, belong to mass data processing and high-performance calculation processing technology field.
Background technology
In current computer realm, there are many application all need carry out mass data processing.At present, be the MapReduce computation model by the mass data processing method of extensive employing.MapReduce is a kind of programming model of realizing the distributed parallel calculation task that Google proposes.It can be distributed in mass data carries out parallel processing on the large-scale cluster.The MapReduce programming model is divided into Map stage and Reduce stage with computation process.Its principle is that data are the piecemeal of specific size by cutting, adopt form distributed storage in cluster of < Key, Value >.Each node in the cluster all has some Map and Reduce task.< Key, Value>that the Map task will be imported handles, and generates other < Key, Value>then; The Reduce task will have identical Key's < Key, Value>data and focus on.MapReduce handles mass data through this simple model exactly.But; Having one type of very important mass data processing to use but is difficult to solve with the MapReduce computation model; This type application has data-intensive (Data-intensive) and two characteristics of computation-intensive (Computational-intensive) simultaneously, such as energy exploration industry data imaging.Energy industry PetroChina Company Limited. gas industry is excessive risk, high investment and hi-tech industry, its upstream business, i.e. and the development of oil-gas exploration and development highly depends on the integrated application of various new and high technologies, particularly infotech.Wherein, imaging and modeling technique are the Core Features of oil-gas exploration software systems.The seismic exploration data processing is high capacity (number PB) and highdensity calculating, is that survey data is analyzed bottleneck maximum in the whole flow process all the time.Data imaging with 1TB is an example, if only adopt high-performance CPU cluster to calculate, needs the time of several weeks.The arithmetic capability of each node can't satisfy these and is applied in the demand on the computation-intensive under the MapReduce model.
Along with making this difficult problem of solution, the technological development of GPU becomes possibility.GPU is a graphic process unit, but the GPU of today no longer has been confined to the 3D graphics process.In Floating-point Computation, parallel computation etc. partly aspect the calculating, and even GPU can provide the performance of tens of times of hundreds of times in CPU.Current, calculate industry and develop to " associated treatment " of CPU and GPU and usefulness from " central authorities handle " of only using CPU.For making this brand-new calculating model; NVIDIA (tall and handsome reaching) has invented CUDA (Compute Unified Device Architecture; Unified calculation equipment framework) this programming model can make full use of CPU and GPU advantage separately in application program.CUDA is a complete GPU solution, the direct access interface of hardware is provided, and needn't must have relied on the visit that the figure api interface is realized GPU as traditional approach.The hardware resource that on framework, has adopted a kind of brand-new counting system structure to use GPU to provide, thus use for large-scale data computation to provide a kind of than CPU powerful computing ability more.CUDA adopts the C language as programming language a large amount of high-performance calculation instruction development abilities to be provided, and makes the developer on the basis of the powerful calculating ability of GPU, set up the higher density data of a kind of efficient and calculates solution.
If can promptly design the collaborative disposal route of calculating of a kind of CPU/GPU with the function expansion of MapReduce to the calling of GPU, realized efficiently just can make those have data-intensive application concurrently with the computation-intensive characteristics.
One Chinese patent application (200910020566.X) " construction method of a kind of GPU and CPU combined processor " has proposed a kind of pin CPU has been set up into the method for combined processor with the GPU coupling, thereby made CPU and GPU can carry out collaborative work.But what on the one hand this method was directed against is single computer, can't integrate CPU and GPU computational resource on the cluster that has a great amount of calculation machine; On the other hand, the CPU in this method can only be responsible for the general procedure task that operating system, system software and general purpose application program etc. have complicated order scheduling, circulation, branch, logic determines, and can't participate in the parallel computation processing of large-scale data.
Summary of the invention
The objective of the invention is in order to overcome the defective of prior art; Calculate the operational efficiency problem of lower for solving such as the mass data that is faced in the fields such as energy exploration industry data imaging, quick radar imagery and finance data analysis; Propose a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance, can be realized efficiently thereby make those have data-intensive application concurrently with the computation-intensive characteristics.
For realizing above-mentioned purpose, the technical scheme that the present invention adopted is following:
A kind of CPU/GPU cooperative processing method that calculates towards mass data high performance may further comprise the steps:
Step 1, set up a computer cluster, and the calculating and the storage resources of each node on the cluster are integrated.
Comprise a scheduling node on the said cluster, be responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node.
Said each node all has own independent CPUs, GPU, internal memory and local disk.On disk access, each node can only be visited local disk, can not visit the disk of other nodes.
Step 2, select for use CUDA, and be installed on each computing node of computer cluster, as the basis of using the GPU computational resource as the GPU computation model.
Step 3, employing MapReduce computation model, the primary control program on scheduling node is some task pieces with division of tasks, is that each task piece starts a Map task, and these Map Task Distribution are calculated to computing node.
Step 4, each computing node are carried out the Map process.Said Map process is following:
At first, design one cover Java comment code also is applied in the Map function, is used for the mark programmer and wants the code section that makes it to walk abreast, is similar to the comment code pattern among the OpenMP.For example " // #gmp parallelfor ", this comment code represent that promptly back to back circulation or function are that needs are parallel.
Then, compiling contains the source code of this kind comment code, obtains containing the Java bytecode of comment code.
Afterwards, new java class loader of design on the basis of traditional java class loader (class loader) is with its called after GPU Class loader.GPU Class loader can discern in the Java bytecode by the part of note, the part that promptly need on GPU, move.Simultaneously, GPU Class loader is deployed on each computing node.
Then, GPU Class loader detects local computing environment automatically, judges whether local GPU resource is available, if unavailable, then directly utilizes CPU to calculate; As if available, then record the concrete version of current C UDA, be adapted to the CUDA code of this version with generation.
Subsequently, GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code and the compiling.Said CUDA code comprises one section power function code and one section run time version.Call the CUDA code that compiles completion, make this partial code be implemented in the operation on the GPU, call the mode that can adopt JNI.When generating the CUDA code, have only when this section code to meet certain independence condition, but and the GPU computational resource time spent in the computing environment, just can accomplish, otherwise send miscue.
At this moment, obtain the operation result of GPU., do not accomplish up to operation by the normally operation on CPU of the code section of note.
At last, scheduling node reruns it for the Mapper of operation failure, thereby accomplishes the Map stage.
Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish whole computings.
Beneficial effect
The inventive method has the following advantages:
(1) this method has realized uniform dispatching, the utilization to CPU on the computer cluster and GPU computing power.Thereby make the application that has data-intensive and technology-intensive characteristics concurrently be able to realize efficiently.
(2) this platform can detect local GPU computing environment automatically, carries out corresponding selection, makes the source code that writes have portability.
(3) the present invention introduces the GPU resource with the mode of comment code, makes the programmer only need understand corresponding comment code and just can develop, and is easy to study.
Description of drawings
Fig. 1 is the schematic flow sheet of the inventive method;
Fig. 2 is the inventive method implementation framework synoptic diagram;
Fig. 3 is the implementation procedure synoptic diagram that utilizes the GPU computing power.
Embodiment
Do further explain below in conjunction with the accompanying drawing specific embodiments of the invention.
A kind of CPU/GPU cooperative processing method that calculates towards mass data high performance; Its ultimate principle is: at first design a cover JAVA comment code standard; On the basis of traditional java class loader one of design new, can discern in the Java bytecode by the java class loader of comment section, with its called after GPU Class loader.Through building a computer cluster of forming by many computing machines, and in cluster, dispose the Hadoop platform after improving.Be added into the Java comment code standard and the GPU Classloader that design in the platform after the improvement.On each computing node, install the CUDA of a certain version, thereby make the user when coding, can in the Map of MapReduce function, be convenient to use the GPU computational resource through comment code.Like Fig. 2, shown in Figure 3.
The concrete performing step of the inventive method is as shown in Figure 1, specific as follows:
Step 1, set up a computer cluster, and the calculating and the storage resources of each node on the cluster are integrated.Wherein, comprise a scheduling node on the cluster, be responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node.Each node all has own independent CPUs, GPU, internal memory and local disk.On disk access, each node can only be visited local disk, can not visit the disk of other nodes.
Step 2, select for use CUDA, and be installed on each computing node of computer cluster, as the basis of using the GPU computational resource as the GPU computation model.
Step 3, employing MapReduce computation model, the primary control program on the scheduling node is divided into some task pieces with task.
Hadoop being installed on each node of computer cluster and setting Map and the Reduce task quantity that can move simultaneously on HDFS data block size block and each node, be designated as m and r respectively, MapReduce can normally be moved in calculating cluster.
Simultaneously, the task scale is designated as K.Behind task run, scheduling node becomes K/block Map task by the HDFS data block that configures size block with division of tasks, and these Map tasks is thought to be assigned on each computing node calculate.When K/block is aliquant, can round up.Simultaneously, start r reduce task, the value of r is set by the user, is set at 1 here.
Step 4, each computing node are carried out the Map process, and concrete implementation procedure is following:
At first, a cover JAVA comment code (being similar to the comment code among the OpenMP) is added in design in Hadoop, and shape is like " // #gmp parallel for ".This cover comment code is used in the Map function, supplies programmer's mark to hope the code that on GPU, moves.
Then, compiling contains the source code of JAVA comment code, obtains containing the Java bytecode of comment code.
Afterwards, new java class loader of design on the basis of traditional java class loader, called after GPU Class loader.GPU Class loader can discern in the Java bytecode by the part of note (part that promptly need on GPU, move), and GPU Class loader is deployed on each computing node.
Then, GPU Class loader detects local computing environment automatically, and whether CUDA is available in inspection, if unavailable, then directly on CPU, calculates; If available, then detect the concrete version of CUDA, and discern in the java class loader by the code section of note (part that promptly need on GPU, move).
GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code, comprise one section power function code and one section run time version, and compile this two sections codes.Call the CUDA code after the compiling with the mode of JNI, related data is copied on the GPU video memory, and the CUDA code moves on GPU.When adopting JNI, have only when this section code to meet certain independence condition, and the GPU resource in the computing environment can the time, just can successfully call, otherwise send miscue.
After GPU calculate to finish, the operation result of the CUDA code local main memory that is copied back, the Map function obtains these operation results.The code section that is not labeled in the Map function moves on CPU.
Afterwards, scheduling node is followed the tracks of the running status of all Map tasks, and the Map task of failing for operation reruns, and accomplishes up to all Map tasks, and the Map process finishes.
Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish and calculate.
Claims (5)
1. a CPU/GPU cooperative processing method that calculates towards mass data high performance is characterized in that, may further comprise the steps:
Step 1, set up a computer cluster, integrate the calculating and the storage resources of each node on the cluster; A scheduling node is set on cluster, is responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node;
Step 2, select for use CUDA, be installed on each computing node of cluster, as the basis of using the GPU computational resource as the GPU computation model;
Step 3, employing MapReduce computation model, the primary control program on the scheduling node is some task pieces with division of tasks, is that each task piece starts a Map task, and these Map Task Distribution are calculated to computing node;
Step 4, each computing node are carried out the Map process;
Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish whole computings.
2. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 1 is characterized in that each node in the said cluster all has own independent CPUs, GPU, internal memory and local disk.
3. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 2 is characterized in that on disk access, each node can only be visited local disk, can not visit the disk of other nodes.
4. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 1 is characterized in that, each computing node execution Map process is following in the said step 4:
At first, design one is overlapped the Java comment code and is applied in the Map function;
Then, compiling contains the source code of this kind comment code, obtains containing the Java bytecode of comment code;
Afterwards, new java class loader of design on traditional java class loader class loader basis is with its called after GPU Class loader; Simultaneously, GPU Class loader is deployed on each computing node;
Then, GPU Class loader detects local computing environment automatically, judges whether local GPU resource is available, if unavailable, then directly utilizes CPU to calculate; As if available, then record the concrete version of current C UDA, be adapted to the CUDA code of this version with generation;
Subsequently, GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code and compile; Call the CUDA code that compiles completion, make this partial code be implemented in the operation on the GPU; When generating the CUDA code, have only when this section code to meet certain independence condition, but and the GPU computational resource time spent in the computing environment just can accomplish, otherwise send miscue;
At this moment, obtain the operation result of GPU; , do not accomplish up to operation by the normally operation on CPU of the code section of note;
At last, scheduling node reruns it for the Mapper of operation failure.
5. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 4 is characterized in that the CUDA code that is generated comprises one section power function code and one section run time version.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101407459A CN102708088A (en) | 2012-05-08 | 2012-05-08 | CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101407459A CN102708088A (en) | 2012-05-08 | 2012-05-08 | CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102708088A true CN102708088A (en) | 2012-10-03 |
Family
ID=46900884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012101407459A Pending CN102708088A (en) | 2012-05-08 | 2012-05-08 | CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102708088A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309645A (en) * | 2013-04-27 | 2013-09-18 | 李朝波 | Method of appending skip function in computer data processing instruction and CPU (Central Processing Unit) module |
CN103324505A (en) * | 2013-06-24 | 2013-09-25 | 曙光信息产业(北京)有限公司 | Method for deploying GPU (graphic processor unit) development environments in cluster system and could computing system |
CN103324538A (en) * | 2013-05-23 | 2013-09-25 | 国家电网公司 | Method for designing dislocated scattered cluster environment distributed concurrent processes |
CN103399787A (en) * | 2013-08-06 | 2013-11-20 | 北京华胜天成科技股份有限公司 | Map Reduce task streaming scheduling method and scheduling system based on Hadoop cloud computing platform |
CN103699656A (en) * | 2013-12-27 | 2014-04-02 | 同济大学 | GPU-based mass-multimedia-data-oriented MapReduce platform |
CN104536937A (en) * | 2014-12-30 | 2015-04-22 | 深圳先进技术研究院 | Big data appliance realizing method based on CPU-GPU heterogeneous cluster |
CN104570081A (en) * | 2013-10-29 | 2015-04-29 | 中国石油化工股份有限公司 | Pre-stack reverse time migration seismic data processing method and system by integral method |
CN104679664A (en) * | 2014-12-26 | 2015-06-03 | 浪潮(北京)电子信息产业有限公司 | Communication method and device in cluster system |
CN104731569A (en) * | 2013-12-23 | 2015-06-24 | 华为技术有限公司 | Data processing method and relevant equipment |
CN104965689A (en) * | 2015-05-22 | 2015-10-07 | 浪潮电子信息产业股份有限公司 | Hybrid parallel computing method and device for CPUs/GPUs |
CN105094981A (en) * | 2014-05-23 | 2015-11-25 | 华为技术有限公司 | Method and device for processing data |
CN105227669A (en) * | 2015-10-15 | 2016-01-06 | 浪潮(北京)电子信息产业有限公司 | A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study |
CN106156810A (en) * | 2015-04-26 | 2016-11-23 | 阿里巴巴集团控股有限公司 | General-purpose machinery learning algorithm model training method, system and calculating node |
CN106227509A (en) * | 2016-06-30 | 2016-12-14 | 扬州大学 | A kind of class towards Java code uses example to generate method |
CN106936897A (en) * | 2017-02-22 | 2017-07-07 | 上海网罗电子科技有限公司 | A kind of high concurrent personnel positioning method for computing data based on GPU |
CN107135257A (en) * | 2017-04-28 | 2017-09-05 | 东方网力科技股份有限公司 | Task is distributed in a kind of node cluster method, node and system |
CN107241767A (en) * | 2017-06-14 | 2017-10-10 | 广东工业大学 | The method and device that a kind of mobile collaboration is calculated |
CN109947563A (en) * | 2019-03-06 | 2019-06-28 | 北京理工大学 | A kind of parallel multilevel fast multipole tree construction compound storage method |
CN110187970A (en) * | 2019-05-30 | 2019-08-30 | 北京理工大学 | A kind of distributed big data parallel calculating method based on Hadoop MapReduce |
CN110569312A (en) * | 2019-11-06 | 2019-12-13 | 创业慧康科技股份有限公司 | big data rapid retrieval system based on GPU and use method thereof |
CN111507466A (en) * | 2019-01-30 | 2020-08-07 | 北京沃东天骏信息技术有限公司 | Data processing method and device, electronic equipment and readable medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080184254A1 (en) * | 2007-01-25 | 2008-07-31 | Bernard Guy S | Systems, methods and apparatus for load balancing across computer nodes of heathcare imaging devices |
US20110074791A1 (en) * | 2009-09-30 | 2011-03-31 | Greg Scantlen | Gpgpu systems and services |
CN102004670A (en) * | 2009-12-17 | 2011-04-06 | 华中科技大学 | Self-adaptive job scheduling method based on MapReduce |
-
2012
- 2012-05-08 CN CN2012101407459A patent/CN102708088A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080184254A1 (en) * | 2007-01-25 | 2008-07-31 | Bernard Guy S | Systems, methods and apparatus for load balancing across computer nodes of heathcare imaging devices |
US20110074791A1 (en) * | 2009-09-30 | 2011-03-31 | Greg Scantlen | Gpgpu systems and services |
CN102004670A (en) * | 2009-12-17 | 2011-04-06 | 华中科技大学 | Self-adaptive job scheduling method based on MapReduce |
Non-Patent Citations (2)
Title |
---|
TOMI AARNIO: "Parallel data processing with MapReduce", 《TKK T-110.5190 SEMINAR ON INTERNETWORKING》, 27 April 2009 (2009-04-27) * |
陈华平 等: "并行分布计算中的任务调度及其分类", 《计算机科学》, vol. 28, no. 1, 31 December 2001 (2001-12-31), pages 45 - 48 * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309645B (en) * | 2013-04-27 | 2015-09-16 | 李朝波 | A kind of method of additional turn function in computer digital animation instruction and CPU module |
CN103309645A (en) * | 2013-04-27 | 2013-09-18 | 李朝波 | Method of appending skip function in computer data processing instruction and CPU (Central Processing Unit) module |
CN103324538A (en) * | 2013-05-23 | 2013-09-25 | 国家电网公司 | Method for designing dislocated scattered cluster environment distributed concurrent processes |
CN103324538B (en) * | 2013-05-23 | 2016-08-10 | 国家电网公司 | A kind of method for designing of the dystopy dispersion distributed concurrent process of cluster environment |
CN103324505A (en) * | 2013-06-24 | 2013-09-25 | 曙光信息产业(北京)有限公司 | Method for deploying GPU (graphic processor unit) development environments in cluster system and could computing system |
CN103324505B (en) * | 2013-06-24 | 2016-12-28 | 曙光信息产业(北京)有限公司 | The method disposing GPU development environment in group system and cloud computing system |
CN103399787A (en) * | 2013-08-06 | 2013-11-20 | 北京华胜天成科技股份有限公司 | Map Reduce task streaming scheduling method and scheduling system based on Hadoop cloud computing platform |
CN103399787B (en) * | 2013-08-06 | 2016-09-14 | 北京华胜天成科技股份有限公司 | A kind of MapReduce operation streaming dispatching method and dispatching patcher calculating platform based on Hadoop cloud |
CN104570081B (en) * | 2013-10-29 | 2017-12-26 | 中国石油化工股份有限公司 | A kind of integration method pre-stack time migration Processing Seismic Data and system |
CN104570081A (en) * | 2013-10-29 | 2015-04-29 | 中国石油化工股份有限公司 | Pre-stack reverse time migration seismic data processing method and system by integral method |
WO2015096649A1 (en) * | 2013-12-23 | 2015-07-02 | 华为技术有限公司 | Data processing method and related device |
CN104731569A (en) * | 2013-12-23 | 2015-06-24 | 华为技术有限公司 | Data processing method and relevant equipment |
CN104731569B (en) * | 2013-12-23 | 2018-04-10 | 华为技术有限公司 | A kind of data processing method and relevant device |
CN103699656A (en) * | 2013-12-27 | 2014-04-02 | 同济大学 | GPU-based mass-multimedia-data-oriented MapReduce platform |
CN105094981A (en) * | 2014-05-23 | 2015-11-25 | 华为技术有限公司 | Method and device for processing data |
WO2015176689A1 (en) * | 2014-05-23 | 2015-11-26 | 华为技术有限公司 | Data processing method and device |
CN105094981B (en) * | 2014-05-23 | 2019-02-12 | 华为技术有限公司 | A kind of method and device of data processing |
CN104679664A (en) * | 2014-12-26 | 2015-06-03 | 浪潮(北京)电子信息产业有限公司 | Communication method and device in cluster system |
CN104536937A (en) * | 2014-12-30 | 2015-04-22 | 深圳先进技术研究院 | Big data appliance realizing method based on CPU-GPU heterogeneous cluster |
CN104536937B (en) * | 2014-12-30 | 2017-10-31 | 深圳先进技术研究院 | Big data all-in-one machine realization method based on CPU GPU isomeric groups |
CN106156810B (en) * | 2015-04-26 | 2019-12-03 | 阿里巴巴集团控股有限公司 | General-purpose machinery learning algorithm model training method, system and calculate node |
CN106156810A (en) * | 2015-04-26 | 2016-11-23 | 阿里巴巴集团控股有限公司 | General-purpose machinery learning algorithm model training method, system and calculating node |
CN104965689A (en) * | 2015-05-22 | 2015-10-07 | 浪潮电子信息产业股份有限公司 | Hybrid parallel computing method and device for CPUs/GPUs |
CN105227669A (en) * | 2015-10-15 | 2016-01-06 | 浪潮(北京)电子信息产业有限公司 | A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study |
CN106227509B (en) * | 2016-06-30 | 2019-03-19 | 扬州大学 | A kind of class towards Java code uses example generation method |
CN106227509A (en) * | 2016-06-30 | 2016-12-14 | 扬州大学 | A kind of class towards Java code uses example to generate method |
CN106936897A (en) * | 2017-02-22 | 2017-07-07 | 上海网罗电子科技有限公司 | A kind of high concurrent personnel positioning method for computing data based on GPU |
CN107135257A (en) * | 2017-04-28 | 2017-09-05 | 东方网力科技股份有限公司 | Task is distributed in a kind of node cluster method, node and system |
CN107241767A (en) * | 2017-06-14 | 2017-10-10 | 广东工业大学 | The method and device that a kind of mobile collaboration is calculated |
CN107241767B (en) * | 2017-06-14 | 2020-10-23 | 广东工业大学 | Mobile collaborative computing method and device |
CN111507466A (en) * | 2019-01-30 | 2020-08-07 | 北京沃东天骏信息技术有限公司 | Data processing method and device, electronic equipment and readable medium |
CN109947563A (en) * | 2019-03-06 | 2019-06-28 | 北京理工大学 | A kind of parallel multilevel fast multipole tree construction compound storage method |
CN109947563B (en) * | 2019-03-06 | 2020-10-27 | 北京理工大学 | Parallel multilayer rapid multi-polar subtree structure composite storage method |
CN110187970A (en) * | 2019-05-30 | 2019-08-30 | 北京理工大学 | A kind of distributed big data parallel calculating method based on Hadoop MapReduce |
CN110569312A (en) * | 2019-11-06 | 2019-12-13 | 创业慧康科技股份有限公司 | big data rapid retrieval system based on GPU and use method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102708088A (en) | CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation | |
Ma et al. | Rammer: Enabling holistic deep learning compiler optimizations with {rTasks} | |
Giorgi et al. | TERAFLUX: Harnessing dataflow in next generation teradevices | |
Liu et al. | Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors | |
Krieder et al. | Design and evaluation of the gemtc framework for gpu-enabled many-task computing | |
Rauchwerger | Run-time parallelization: Its time has come | |
Meng et al. | Preliminary experiences with the uintah framework on intel xeon phi and stampede | |
Wang et al. | A framework for distributed data-parallel execution in the Kepler scientific workflow system | |
Peterson et al. | Demonstrating GPU code portability and scalability for radiative heat transfer computations | |
Carneiro Pessoa et al. | GPU‐accelerated backtracking using CUDA Dynamic Parallelism | |
Palmskog et al. | piCoq: Parallel regression proving for large-scale verification projects | |
Segal et al. | High level programming for heterogeneous architectures | |
Pöppl et al. | SWE-X10: Simulating shallow water waves with lazy activation of patches using ActorX10 | |
Mivule et al. | A review of cuda, mapreduce, and pthreads parallel computing models | |
Kunzman et al. | Towards a framework for abstracting accelerators in parallel applications: experience with cell | |
Dubrulle et al. | A low-overhead dedicated execution support for stream applications on shared-memory CMP | |
Davis et al. | Paradigmatic shifts for exascale supercomputing | |
Andon et al. | Programming high-performance parallel computations: formal models and graphics processing units | |
Liu et al. | BSPCloud: A hybrid distributed-memory and shared-memory programming model | |
Cao et al. | Evaluating data redistribution in parsec | |
Ali et al. | A parallel programming model for Ada | |
Gainaru et al. | Understanding the impact of data staging for coupled scientific workflows | |
Tarakji et al. | Os support for load scheduling on accelerator-based heterogeneous systems | |
Weng et al. | Acceleration of a Python-based tsunami modelling application via CUDA and OpenHMPP | |
Guaitero et al. | Automatic Asynchronous Execution of Synchronously Offloaded OpenMP Target Regions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20121003 |