CN103713938A - Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment - Google Patents

Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment Download PDF

Info

Publication number
CN103713938A
CN103713938A CN201310695055.4A CN201310695055A CN103713938A CN 103713938 A CN103713938 A CN 103713938A CN 201310695055 A CN201310695055 A CN 201310695055A CN 103713938 A CN103713938 A CN 103713938A
Authority
CN
China
Prior art keywords
gpu
matrix
data
openmp
host side
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310695055.4A
Other languages
Chinese (zh)
Inventor
秦谦
袁家斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Mingtong Tech Co Ltd
Original Assignee
Jiangsu Mingtong Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Mingtong Tech Co Ltd filed Critical Jiangsu Mingtong Tech Co Ltd
Priority to CN201310695055.4A priority Critical patent/CN103713938A/en
Publication of CN103713938A publication Critical patent/CN103713938A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under a virtual environment. The method includes the following steps that host end threads with the same number as GPUs are arranged at the host end through the Open MP, and each host end thread is in charge of controlling one GPU, distributing video memory for each thread on each device and starting a kernel function. Each thread is provided with data indicators of its private host end and device, data computing and data merging are conducted, data is copied to the private host ends through device end data indicators owned by the GPUs, and data merging is conducted at the private host ends. Compared with the prior art, multi-GPU cooperative computing is achieved under the virtual environment, a single task is quickened by using the multiple GPUs, and the method has theoretical and practical significance on super computing, cloud computing and grid computing based on a central processing unit (CPU)+GPU isomerism platform.

Description

The collaborative computing method of many GPU based on OpenMP under virtualized environment
Technical field
The present invention relates to the many GPU of single task under virtualized environment and calculate field, relate in particular to the collaborative computing method of the many GPU based on OpenMP under virtualized environment.
Background technology
The collaborative computing technique of existing many GPU is all based on physical machine, OpenMP is generally for CPU parallel computation, be used in the sdk of GPU Zhong Yejiu NVIDIA official and provided example, do not support the API that many GPU are complete, gVirtuS is comparatively ripe at present GPU virtualization solution, it has solved and under virtualized environment, has utilized GPU to carry out the problem of CUDA programming, but its solution is all for single GPU, many GPU are not studied, in a word, in prior art, task of can not utilize many GPU to be on a grand scale to data under virtualized environment is accelerated simultaneously.
Summary of the invention
The present invention has overcome the deficiencies in the prior art, and the collaborative computing method of the many GPU based on OpenMP under a kind of virtualized environment are provided.
For solving the problems of the technologies described above, the technical solution used in the present invention is:
The collaborative computing method of many GPU based on OpenMP under virtualized environment, comprise the following steps,
Step S01; in service end, dispose GPU virtual (gVirtuS) service end assembly; to in this locality, carry out in the parameter of client interception; in physical machine, carry out; physical machine is offered the host side thread identical with GPU number by OpenMP in host side; each host side thread is responsible for controlling a GPU, and by DLL (dynamic link library) function, No. ID device number with GPU of host side thread is corresponding, the object definition that GPU is calculated is N*N matrix; DLL (dynamic link library) function is cudaSetDevice (cpu_thread_id) function;
In prior art, under virtualized environment, not for the driver of GPU, step S01 passes to service end by concrete execution after service end is disposed GPU virtual (gVirtuS) service end assembly, after service end completes, result is passed to virtual machine;
Step S02 is each thread distribution video memory on each equipment, and each self-starting kernel function, compute matrix compound operation, and described video memory size is distributed according to calculative size of data, the kernel function of described kernel function for multiplying each other for compute matrix;
Step S03, data decomposition, each thread arranges the data pointer of own privately owned host side and equipment, privately owned host side thread points to the different reference position of original pointer, and by CUDA copy function, be that cudaMemcpy () function reaches the object that scale is divided, line number or columns that wherein N is matrix, the number that n is GPU from position N/n data scale of copy of own privately owned host side thread;
Step S04, data are calculated, and according to matrix multiple rule, the matrix of n GPU are calculated, and OpenMP synchronization module is controlled the output time of described GPU result of calculation, by ccudaDeviceSynchronize () function, synchronously exports data;
Step S05, data merge, in described step S04, the data of synchrodata copy back privately owned host side by the privately owned equipment end data pointer of GPU, in privately owned host side, carry out data merging, service end is communicated by letter to result of calculation is passed to client by socket after completing calculating, and described client is virtual machine.
In step S03, GPU number is 4, is respectively GPU0, GPU1, GPU2 and GPU3, and described matrix is A, B, C and D, and A*B+C*D data decomposition comprises the following steps:
1) GPU 0 carry out matrix A half be multiplied by B, GPU 1 carry out matrix A second half be multiplied by B, GPU 2 carry out Matrix C half be multiplied by D, GPU 3 carry out Matrix C second half be multiplied by D;
2) OpenMP synchronization module waits for that the phase multiplication of GPU0, GPU1, GPU2 and GPU3 all completes, and copies total data on GPU 0 to by cudaMemcpy () function the correctness of host side check results.
Data are calculated and are adopted matrix multiplication to calculate, comprise the following steps: A matrix is divided into 4 A/4(N/4*N), and multiply each other with matrix B respectively, obtain respectively 4 AB/4 matrixes, 4 AB/4 matrix combinations can obtain matrix of consequence AB, with the method, calculate respectively matrix multiple in GPU0, GPU1, GPU2 and GPU3, AB matrix is N*N matrix, and AB matrix is similarly N*N matrix.。
Compared with prior art, beneficial effect of the present invention has: the present invention realizes the collaborative calculating of many GPU under virtualized environment, utilize many GPU to accelerate single task, the supercomputing based on CPU+GPU heterogeneous platform, cloud computing and grid computing are had to great theory and realistic meaning.
Accompanying drawing explanation
Fig. 1 is method flow diagram of the present invention.
Fig. 2 is data decomposition algorithm schematic diagram of the present invention.
Fig. 3 is data computational algorithm schematic diagram of the present invention.
Fig. 4 is the computing time comparison diagram of A*B+C*D under many GPU and single GPU environment.
Embodiment
Below in conjunction with accompanying drawing, the present invention is further described.
As shown in Figure 1, the collaborative computing method of many GPU based on OpenMP under virtualized environment, comprise the following steps,
Step S01; in service end, dispose GPU virtual (gVirtuS) service end assembly; to in this locality, carry out in the parameter of client interception; in physical machine, carry out; physical machine is offered the host side thread identical with GPU number by OpenMP in host side; each host side thread is responsible for controlling a GPU, and by cudaSetDevice (cpu_thread_id) function, No. ID device number with GPU of host side thread is corresponding, the object definition that GPU is calculated is N*N matrix;
Step S02 is each thread distribution video memory on each equipment, and each self-starting kernel function, compute matrix compound operation, and described video memory size is distributed according to calculative size of data, the kernel function of described kernel function for multiplying each other for compute matrix;
Step S03, data decomposition, each thread arranges the data pointer of own privately owned host side and equipment, privately owned host side thread points to the different reference position of original pointer, and from position N/n data scale of copy of own privately owned host side thread, reach the object that scale is divided by cudaMemcpy () function, the line number that wherein N is matrix, the number that n is GPU;
Step S04, data are calculated, and according to matrix multiple rule, the matrix of n GPU are calculated, and OpenMP synchronization module is controlled the output time of described GPU result of calculation, by ccudaDeviceSynchronize () function, synchronously exports data;
Step S05, data merge, in described step S04, the data of synchrodata copy back privately owned host side by the privately owned equipment end data pointer of GPU, in privately owned host side, carry out data merging, service end is communicated by letter to result of calculation being passed to client (virtual machine) by socket after completing calculating.
As shown in Figure 2, in step S03, GPU number is 4, is respectively GPU0, GPU1, GPU2 and GPU3, and described matrix is A, B, C and D, and A*B+C*D data decomposition comprises the following steps:
1) GPU 0 carry out matrix A half be multiplied by B, GPU 1 carry out matrix A second half be multiplied by B, GPU 2 carry out Matrix C half be multiplied by D, GPU 3 carry out Matrix C second half be multiplied by D;
2) OpenMP synchronization module waits for that the phase multiplication of GPU0, GPU1, GPU2 and GPU3 all completes, and copies total data on GPU 0 to by cudaMemcpy () function the correctness of host side check results.
As shown in Figure 3, data are calculated and are adopted matrix multiplication to calculate, comprise the following steps: A matrix is divided into 4 A/4(N/4*N), and multiply each other with matrix B respectively, obtain respectively 4 AB/4 matrixes, 4 AB/4 matrix combinations can obtain matrix of consequence AB, with the method, calculate respectively matrix multiple in GPU0, GPU1, GPU2 and GPU3, and AB matrix is N*N matrix.
Fig. 4 is that A*B+C*D is at comparison diagram computing time under the many GPU of virtualized environment and single GPU environment, in diagram, when matrix exponent number increases, single GPU is index and rises operation time, consuming time long, the collaborative calculating of many GPU based on OpenMP under virtualized environment, when matrix exponent number increases, the approximate linear increase that is consuming time, efficiency is high.
The above is only the preferred embodiment of the present invention; be noted that for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (10)

1. under virtualized environment, the many GPU based on OpenMP work in coordination with computing method, it is characterized in that: comprise the following steps,
Step S01; in service end, dispose GPU virtualization services end assembly; to in this locality, carry out in the parameter of client interception; in physical machine, carry out; physical machine is offered the host side thread identical with GPU number by OpenMP in host side; each host side thread is responsible for controlling a GPU, and by DLL (dynamic link library) function, No. ID device number with GPU of host side thread is corresponding, the object definition that GPU is calculated is N*N matrix computations;
Step S02 is each thread distribution video memory, and starts respectively kernel function on each equipment;
Step S03, data decomposition, each thread arranges the data pointer of own privately owned host side and equipment, privately owned host side thread points to the different reference position of original pointer, and from position N/n data of copy of own privately owned host side thread, reach the object that scale is divided by CUDA copy function, the line number that wherein N is matrix, the number that n is GPU;
Step S04, data are calculated, and according to matrix multiple rule, the matrix of n GPU are calculated, and OpenMP controls the output time of described GPU result of calculation, synchronously exports data;
Step S05, data merge, and in described step S04, the data of synchrodata copy back privately owned host side by the privately owned equipment end data pointer of GPU, in privately owned host side, carry out data merging, service end is communicated by letter by socket after completing calculating, and result of calculation is passed to client.
2. under virtualized environment according to claim 1, the many GPU based on OpenMP work in coordination with computing method, it is characterized in that: in described step S03, GPU number is 4, be respectively GPU0, GPU1, GPU2 and GPU3, described matrix is A, B, C and D, and A*B+C*D data decomposition comprises the following steps:
1) GPU 0 carry out matrix A half be multiplied by B, GPU 1 carry out matrix A second half be multiplied by B, GPU 2 carry out Matrix C half be multiplied by D, GPU 3 carry out Matrix C second half be multiplied by D;
2) OpenMP synchronization module waits for that the phase multiplication of GPU0, GPU1, GPU2 and GPU3 all completes, and copies total data on GPU 0 to the correctness of host side check results.
3. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 2, is characterized in that: the data that calculate in described GPU1, GPU2 and GPU3 copy GPU 0 to by cudaMemcpy () function.
4. according to the collaborative computing method of the many GPU based on OpenMP under the virtualized environment described in claim 1 or 2, it is characterized in that: described data are calculated and adopted matrix multiplication to calculate, comprise the following steps: A matrix is divided into 4 A/4(N/4*N), and multiply each other with matrix B respectively, obtain respectively 4 AB/4 matrixes, 4 AB/4 matrix combinations can obtain matrix of consequence AB, with the method, calculate respectively matrix multiple in GPU0, GPU1, GPU2 and GPU3.
5. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described DLL (dynamic link library) function is cudaSetDevice (cpu_thread_id) function.
6. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described step S04 synchronously completes by ccudaDeviceSynchronize () function.
7. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described step CUDA copy function is cudaMemcpy () function.
8. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described client is virtual machine.
9. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described video memory size is distributed according to calculative size of data.
10. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that the kernel function of described kernel function for multiplying each other for compute matrix.
CN201310695055.4A 2013-12-17 2013-12-17 Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment Pending CN103713938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310695055.4A CN103713938A (en) 2013-12-17 2013-12-17 Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310695055.4A CN103713938A (en) 2013-12-17 2013-12-17 Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment

Publications (1)

Publication Number Publication Date
CN103713938A true CN103713938A (en) 2014-04-09

Family

ID=50406941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310695055.4A Pending CN103713938A (en) 2013-12-17 2013-12-17 Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment

Country Status (1)

Country Link
CN (1) CN103713938A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216783A (en) * 2014-08-20 2014-12-17 上海交通大学 Method for automatically managing and controlling virtual GPU (Graphics Processing Unit) resource in cloud gaming
WO2016093428A1 (en) * 2014-12-11 2016-06-16 한화테크윈 주식회사 Mini integrated control device
WO2016093427A1 (en) * 2014-12-11 2016-06-16 한화테크윈 주식회사 Mini integrated control device
CN107797843A (en) * 2016-09-02 2018-03-13 华为技术有限公司 A kind of method and apparatus of container function enhancing
CN110546642A (en) * 2018-10-17 2019-12-06 阿里巴巴集团控股有限公司 secure multi-party computing without using trusted initializer
CN110543711A (en) * 2019-08-26 2019-12-06 中国原子能科学研究院 parallel implementation and optimization method for numerical reactor thermal hydraulic sub-channel simulation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120160A1 (en) * 2003-08-20 2005-06-02 Jerry Plouffe System and method for managing virtual servers
CN102609990A (en) * 2012-01-05 2012-07-25 中国海洋大学 Massive-scene gradually-updating algorithm facing complex three dimensional CAD (Computer-Aided Design) model
CN102650950A (en) * 2012-04-10 2012-08-29 南京航空航天大学 Platform architecture supporting multi-GPU (Graphics Processing Unit) virtualization and work method of platform architecture
CN103136035A (en) * 2011-11-30 2013-06-05 国际商业机器公司 Method and device for thread management of hybrid threading mode program
CN103279330A (en) * 2013-05-14 2013-09-04 江苏名通信息科技有限公司 MapReduce multiple programming model based on virtual machine GPU computation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120160A1 (en) * 2003-08-20 2005-06-02 Jerry Plouffe System and method for managing virtual servers
CN103136035A (en) * 2011-11-30 2013-06-05 国际商业机器公司 Method and device for thread management of hybrid threading mode program
CN102609990A (en) * 2012-01-05 2012-07-25 中国海洋大学 Massive-scene gradually-updating algorithm facing complex three dimensional CAD (Computer-Aided Design) model
CN102650950A (en) * 2012-04-10 2012-08-29 南京航空航天大学 Platform architecture supporting multi-GPU (Graphics Processing Unit) virtualization and work method of platform architecture
CN103279330A (en) * 2013-05-14 2013-09-04 江苏名通信息科技有限公司 MapReduce multiple programming model based on virtual machine GPU computation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石林: "GPU通用计算虚拟化方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216783A (en) * 2014-08-20 2014-12-17 上海交通大学 Method for automatically managing and controlling virtual GPU (Graphics Processing Unit) resource in cloud gaming
CN104216783B (en) * 2014-08-20 2017-07-11 上海交通大学 Virtual GPU resource autonomous management and control method in cloud game
WO2016093428A1 (en) * 2014-12-11 2016-06-16 한화테크윈 주식회사 Mini integrated control device
WO2016093427A1 (en) * 2014-12-11 2016-06-16 한화테크윈 주식회사 Mini integrated control device
CN107797843A (en) * 2016-09-02 2018-03-13 华为技术有限公司 A kind of method and apparatus of container function enhancing
CN107797843B (en) * 2016-09-02 2021-04-20 华为技术有限公司 Method and device for enhancing function of container
CN110546642A (en) * 2018-10-17 2019-12-06 阿里巴巴集团控股有限公司 secure multi-party computing without using trusted initializer
WO2020077959A1 (en) * 2018-10-17 2020-04-23 Alibaba Group Holding Limited Secure multi-party computation with no trusted initializer
US11386212B2 (en) 2018-10-17 2022-07-12 Advanced New Technologies Co., Ltd. Secure multi-party computation with no trusted initializer
CN110543711A (en) * 2019-08-26 2019-12-06 中国原子能科学研究院 parallel implementation and optimization method for numerical reactor thermal hydraulic sub-channel simulation
CN110543711B (en) * 2019-08-26 2021-07-20 中国原子能科学研究院 Parallel implementation and optimization method for numerical reactor thermal hydraulic sub-channel simulation

Similar Documents

Publication Publication Date Title
US10223762B2 (en) Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units
CN103713938A (en) Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment
Bo et al. Accelerating FDTD algorithm using GPU computing
Liu et al. A GPU accelerated red-black SOR algorithm for computational fluid dynamics problems
CN103632336B (en) Based on the remote sensing image CPU/GPU cooperative processing method that load distributes
Xia et al. OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows
Granat et al. Parallel solvers for Sylvester-type matrix equations with applications in condition estimation, Part I: theory and algorithms
Wang et al. A survey of statistical methods and computing for big data
Song et al. A fine-grained parallel EMTP algorithm compatible to graphic processing units
Esfahanian et al. An efficient GPU implementation of cyclic reduction solver for high-order compressible viscous flow simulations
CN105183562A (en) Method for conducting degree drawing on grid data on basis of CUDA technology
Nguyen et al. GPU parallelization of multigrid RANS solver for three-dimensional aerodynamic simulations on multiblock grids
Peng et al. Cloud computing model based on MPI and OpenMP
Stojanović et al. Solving Gross Pitaevskii equation using dataflow paradigm
Mittal et al. Machine Learning computation on multiple GPU's using CUDA and message passing interface
Delgado et al. Embarrassingly easy embarrassingly parallel processing in R
CN104793922B (en) A kind of Parallel Implementation method of large integer multiplication Comba algorithms based on OpenMP
Shah et al. An efficient sparse matrix multiplication for skewed matrix on gpu
Lastovetsky et al. Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models
Demchik et al. QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems
Satake et al. Optimizations of a GPU accelerated heat conduction equation by a programming of CUDA Fortran from an analysis of a PTX file
CN104915187A (en) Graph model calculation method and device
Hu et al. Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems
CN104615583A (en) Data processing method and device based on GPU platform
US20130106887A1 (en) Texture generation using a transformation matrix

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140409