CN102708088A

CN102708088A - CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation

Info

Publication number: CN102708088A
Application number: CN2012101407459A
Authority: CN
Inventors: 翟岩龙; 刘培志; 罗壮; 黄河燕; 宿红毅; 郭琨毅
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2012-05-08
Filing date: 2012-05-08
Publication date: 2012-10-03

Abstract

The invention provides a CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation, which is used for solving the problem of lower operating efficiency of mass data computation. By designing a set of JAVA comment code criterion and building a computer cluster composed of multiple computers, an improved Hadoop platform is arranged in the cluster, and the designed JAVA comment code criterion and a GPU Class loader are added in the improved platform; a CUDA (Compute Unified Device Architecture) of a certain edition is mounted on each computing node, so that the user can conveniently use the CPU computation resource in a Map function of a MapReduce through the commend code when programming. The method realizes unified dispatch and utilization of CPU and GPU computing power on the computer cluster, so that the application having both data-intensive property and computation-intensive property can be realized efficiently, and the programmed source code is transplantable, and convenient for the programmer to develop.

Description

CPU/GPU cooperative processing method towards mass data high performance calculating

Technical field

The present invention relates to a kind of use and set up the method for the collaborative computing platform of CPU/GPU, belong to mass data processing and high-performance calculation processing technology field.

Background technology

In current computer realm, there are many application all need carry out mass data processing.At present, be the MapReduce computation model by the mass data processing method of extensive employing.MapReduce is a kind of programming model of realizing the distributed parallel calculation task that Google proposes.It can be distributed in mass data carries out parallel processing on the large-scale cluster.The MapReduce programming model is divided into Map stage and Reduce stage with computation process.Its principle is that data are the piecemeal of specific size by cutting, adopt form distributed storage in cluster of < Key, Value >.Each node in the cluster all has some Map and Reduce task.< Key, Value>that the Map task will be imported handles, and generates other < Key, Value>then; The Reduce task will have identical Key's < Key, Value>data and focus on.MapReduce handles mass data through this simple model exactly.But; Having one type of very important mass data processing to use but is difficult to solve with the MapReduce computation model; This type application has data-intensive (Data-intensive) and two characteristics of computation-intensive (Computational-intensive) simultaneously, such as energy exploration industry data imaging.Energy industry PetroChina Company Limited. gas industry is excessive risk, high investment and hi-tech industry, its upstream business, i.e. and the development of oil-gas exploration and development highly depends on the integrated application of various new and high technologies, particularly infotech.Wherein, imaging and modeling technique are the Core Features of oil-gas exploration software systems.The seismic exploration data processing is high capacity (number PB) and highdensity calculating, is that survey data is analyzed bottleneck maximum in the whole flow process all the time.Data imaging with 1TB is an example, if only adopt high-performance CPU cluster to calculate, needs the time of several weeks.The arithmetic capability of each node can't satisfy these and is applied in the demand on the computation-intensive under the MapReduce model.

Along with making this difficult problem of solution, the technological development of GPU becomes possibility.GPU is a graphic process unit, but the GPU of today no longer has been confined to the 3D graphics process.In Floating-point Computation, parallel computation etc. partly aspect the calculating, and even GPU can provide the performance of tens of times of hundreds of times in CPU.Current, calculate industry and develop to " associated treatment " of CPU and GPU and usefulness from " central authorities handle " of only using CPU.For making this brand-new calculating model; NVIDIA (tall and handsome reaching) has invented CUDA (Compute Unified Device Architecture; Unified calculation equipment framework) this programming model can make full use of CPU and GPU advantage separately in application program.CUDA is a complete GPU solution, the direct access interface of hardware is provided, and needn't must have relied on the visit that the figure api interface is realized GPU as traditional approach.The hardware resource that on framework, has adopted a kind of brand-new counting system structure to use GPU to provide, thus use for large-scale data computation to provide a kind of than CPU powerful computing ability more.CUDA adopts the C language as programming language a large amount of high-performance calculation instruction development abilities to be provided, and makes the developer on the basis of the powerful calculating ability of GPU, set up the higher density data of a kind of efficient and calculates solution.

If can promptly design the collaborative disposal route of calculating of a kind of CPU/GPU with the function expansion of MapReduce to the calling of GPU, realized efficiently just can make those have data-intensive application concurrently with the computation-intensive characteristics.

One Chinese patent application (200910020566.X) " construction method of a kind of GPU and CPU combined processor " has proposed a kind of pin CPU has been set up into the method for combined processor with the GPU coupling, thereby made CPU and GPU can carry out collaborative work.But what on the one hand this method was directed against is single computer, can't integrate CPU and GPU computational resource on the cluster that has a great amount of calculation machine; On the other hand, the CPU in this method can only be responsible for the general procedure task that operating system, system software and general purpose application program etc. have complicated order scheduling, circulation, branch, logic determines, and can't participate in the parallel computation processing of large-scale data.

Summary of the invention

The objective of the invention is in order to overcome the defective of prior art; Calculate the operational efficiency problem of lower for solving such as the mass data that is faced in the fields such as energy exploration industry data imaging, quick radar imagery and finance data analysis; Propose a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance, can be realized efficiently thereby make those have data-intensive application concurrently with the computation-intensive characteristics.

For realizing above-mentioned purpose, the technical scheme that the present invention adopted is following:

A kind of CPU/GPU cooperative processing method that calculates towards mass data high performance may further comprise the steps:

Step 1, set up a computer cluster, and the calculating and the storage resources of each node on the cluster are integrated.

Comprise a scheduling node on the said cluster, be responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node.

Said each node all has own independent CPUs, GPU, internal memory and local disk.On disk access, each node can only be visited local disk, can not visit the disk of other nodes.

Step 2, select for use CUDA, and be installed on each computing node of computer cluster, as the basis of using the GPU computational resource as the GPU computation model.

Step 3, employing MapReduce computation model, the primary control program on scheduling node is some task pieces with division of tasks, is that each task piece starts a Map task, and these Map Task Distribution are calculated to computing node.

Step 4, each computing node are carried out the Map process.Said Map process is following:

At first, design one cover Java comment code also is applied in the Map function, is used for the mark programmer and wants the code section that makes it to walk abreast, is similar to the comment code pattern among the OpenMP.For example " // #gmp parallelfor ", this comment code represent that promptly back to back circulation or function are that needs are parallel.

Then, compiling contains the source code of this kind comment code, obtains containing the Java bytecode of comment code.

Afterwards, new java class loader of design on the basis of traditional java class loader (class loader) is with its called after GPU Class loader.GPU Class loader can discern in the Java bytecode by the part of note, the part that promptly need on GPU, move.Simultaneously, GPU Class loader is deployed on each computing node.

Then, GPU Class loader detects local computing environment automatically, judges whether local GPU resource is available, if unavailable, then directly utilizes CPU to calculate; As if available, then record the concrete version of current C UDA, be adapted to the CUDA code of this version with generation.

Subsequently, GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code and the compiling.Said CUDA code comprises one section power function code and one section run time version.Call the CUDA code that compiles completion, make this partial code be implemented in the operation on the GPU, call the mode that can adopt JNI.When generating the CUDA code, have only when this section code to meet certain independence condition, but and the GPU computational resource time spent in the computing environment, just can accomplish, otherwise send miscue.

At this moment, obtain the operation result of GPU., do not accomplish up to operation by the normally operation on CPU of the code section of note.

At last, scheduling node reruns it for the Mapper of operation failure, thereby accomplishes the Map stage.

Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish whole computings.

Beneficial effect

The inventive method has the following advantages:

(1) this method has realized uniform dispatching, the utilization to CPU on the computer cluster and GPU computing power.Thereby make the application that has data-intensive and technology-intensive characteristics concurrently be able to realize efficiently.

(2) this platform can detect local GPU computing environment automatically, carries out corresponding selection, makes the source code that writes have portability.

(3) the present invention introduces the GPU resource with the mode of comment code, makes the programmer only need understand corresponding comment code and just can develop, and is easy to study.

Description of drawings

Fig. 1 is the schematic flow sheet of the inventive method;

Fig. 2 is the inventive method implementation framework synoptic diagram;

Fig. 3 is the implementation procedure synoptic diagram that utilizes the GPU computing power.

Embodiment

Do further explain below in conjunction with the accompanying drawing specific embodiments of the invention.

A kind of CPU/GPU cooperative processing method that calculates towards mass data high performance; Its ultimate principle is: at first design a cover JAVA comment code standard; On the basis of traditional java class loader one of design new, can discern in the Java bytecode by the java class loader of comment section, with its called after GPU Class loader.Through building a computer cluster of forming by many computing machines, and in cluster, dispose the Hadoop platform after improving.Be added into the Java comment code standard and the GPU Classloader that design in the platform after the improvement.On each computing node, install the CUDA of a certain version, thereby make the user when coding, can in the Map of MapReduce function, be convenient to use the GPU computational resource through comment code.Like Fig. 2, shown in Figure 3.

The concrete performing step of the inventive method is as shown in Figure 1, specific as follows:

Step 1, set up a computer cluster, and the calculating and the storage resources of each node on the cluster are integrated.Wherein, comprise a scheduling node on the cluster, be responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node.Each node all has own independent CPUs, GPU, internal memory and local disk.On disk access, each node can only be visited local disk, can not visit the disk of other nodes.

Step 3, employing MapReduce computation model, the primary control program on the scheduling node is divided into some task pieces with task.

Hadoop being installed on each node of computer cluster and setting Map and the Reduce task quantity that can move simultaneously on HDFS data block size block and each node, be designated as m and r respectively, MapReduce can normally be moved in calculating cluster.

Simultaneously, the task scale is designated as K.Behind task run, scheduling node becomes K/block Map task by the HDFS data block that configures size block with division of tasks, and these Map tasks is thought to be assigned on each computing node calculate.When K/block is aliquant, can round up.Simultaneously, start r reduce task, the value of r is set by the user, is set at 1 here.

Step 4, each computing node are carried out the Map process, and concrete implementation procedure is following:

At first, a cover JAVA comment code (being similar to the comment code among the OpenMP) is added in design in Hadoop, and shape is like " // #gmp parallel for ".This cover comment code is used in the Map function, supplies programmer's mark to hope the code that on GPU, moves.

Then, compiling contains the source code of JAVA comment code, obtains containing the Java bytecode of comment code.

Afterwards, new java class loader of design on the basis of traditional java class loader, called after GPU Class loader.GPU Class loader can discern in the Java bytecode by the part of note (part that promptly need on GPU, move), and GPU Class loader is deployed on each computing node.

Then, GPU Class loader detects local computing environment automatically, and whether CUDA is available in inspection, if unavailable, then directly on CPU, calculates; If available, then detect the concrete version of CUDA, and discern in the java class loader by the code section of note (part that promptly need on GPU, move).

GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code, comprise one section power function code and one section run time version, and compile this two sections codes.Call the CUDA code after the compiling with the mode of JNI, related data is copied on the GPU video memory, and the CUDA code moves on GPU.When adopting JNI, have only when this section code to meet certain independence condition, and the GPU resource in the computing environment can the time, just can successfully call, otherwise send miscue.

After GPU calculate to finish, the operation result of the CUDA code local main memory that is copied back, the Map function obtains these operation results.The code section that is not labeled in the Map function moves on CPU.

Afterwards, scheduling node is followed the tracks of the running status of all Map tasks, and the Map task of failing for operation reruns, and accomplishes up to all Map tasks, and the Map process finishes.

Step 5, carry out the Reduce stage, gather Map stage operation result, accomplish and calculate.

Claims

1. a CPU/GPU cooperative processing method that calculates towards mass data high performance is characterized in that, may further comprise the steps:

Step 1, set up a computer cluster, integrate the calculating and the storage resources of each node on the cluster; A scheduling node is set on cluster, is responsible for all tasks are carried out scheduling controlling, all the other nodes are as computing node;

Step 2, select for use CUDA, be installed on each computing node of cluster, as the basis of using the GPU computational resource as the GPU computation model;

Step 3, employing MapReduce computation model, the primary control program on the scheduling node is some task pieces with division of tasks, is that each task piece starts a Map task, and these Map Task Distribution are calculated to computing node;

Step 4, each computing node are carried out the Map process;

2. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 1 is characterized in that each node in the said cluster all has own independent CPUs, GPU, internal memory and local disk.

3. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 2 is characterized in that on disk access, each node can only be visited local disk, can not visit the disk of other nodes.

4. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 1 is characterized in that, each computing node execution Map process is following in the said step 4:

At first, design one is overlapped the Java comment code and is applied in the Map function;

Then, compiling contains the source code of this kind comment code, obtains containing the Java bytecode of comment code;

Afterwards, new java class loader of design on traditional java class loader class loader basis is with its called after GPU Class loader; Simultaneously, GPU Class loader is deployed on each computing node;

Then, GPU Class loader detects local computing environment automatically, judges whether local GPU resource is available, if unavailable, then directly utilizes CPU to calculate; As if available, then record the concrete version of current C UDA, be adapted to the CUDA code of this version with generation;

Subsequently, GPU Class loader in the Java bytecode that identifies by the part of note, generate corresponding CUDA code and compile; Call the CUDA code that compiles completion, make this partial code be implemented in the operation on the GPU; When generating the CUDA code, have only when this section code to meet certain independence condition, but and the GPU computational resource time spent in the computing environment just can accomplish, otherwise send miscue;

At this moment, obtain the operation result of GPU; , do not accomplish up to operation by the normally operation on CPU of the code section of note;

At last, scheduling node reruns it for the Mapper of operation failure.

5. a kind of CPU/GPU cooperative processing method that calculates towards mass data high performance as claimed in claim 4 is characterized in that the CUDA code that is generated comprises one section power function code and one section run time version.