CN109857543A - A kind of streamline simulation accelerated method calculated based on the more GPU of multinode - Google Patents

A kind of streamline simulation accelerated method calculated based on the more GPU of multinode Download PDF

Info

Publication number
CN109857543A
CN109857543A CN201811574392.7A CN201811574392A CN109857543A CN 109857543 A CN109857543 A CN 109857543A CN 201811574392 A CN201811574392 A CN 201811574392A CN 109857543 A CN109857543 A CN 109857543A
Authority
CN
China
Prior art keywords
gpu
formula
multinode
streamline
parallel computation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811574392.7A
Other languages
Chinese (zh)
Inventor
季晓慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences Beijing
Original Assignee
China University of Geosciences Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences Beijing filed Critical China University of Geosciences Beijing
Priority to CN201811574392.7A priority Critical patent/CN109857543A/en
Publication of CN109857543A publication Critical patent/CN109857543A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a kind of accelerated methods that streamline simulation is realized based on the more GPU parallel computation particles trace algorithms of multinode, belong to streamline numerical simulation field.This method operates on several computers or server with multiple GPU.This method comprises: zoning is discrete for several grids;The essential information of host process initialization model, aquifer parameter, coefficient matrix and grid division model in velocity vector formula, and be broadcasted in other processes;Each process calls one piece of GPU, and stream is created on the GPU and is accelerated, GPU memory headroom storing data is opened up;Each GPU starts several threads, the next position and a upper position of per thread according to current location using particles trace algorithm calculating particle forwardly and rearwardly, iterate the process, obtains a complete streamline, then passes GPU data in the CPU memory for distributing to current process back.The calculated result for finally being summarized each process using MPI collective communication function, after handling invalid data, is output in destination file.The present invention makes full use of the GPU resource on cluster system resource and node, realizes the depth parallel computation of extensive streamline.It is significant with acceleration effect, the advantages of quickly generating streamline.

Description

A kind of streamline simulation accelerated method calculated based on the more GPU of multinode
Technical field
The present invention relates to streamline numerical simulation fields, particularly relate to one kind and are chased after based on the more GPU parallel computation particles of multinode Track algorithm is to realize the accelerated method of streamline simulation
Background technique
Streamline simulation technology is not only that study of groundwater provides image, intuitive ground water movement state information, to study week Hydrological characteristics are enclosed, and provide oil-layer characteristic and production performance information for the test of reservoir engineering inter-well tracer test, it is convenient preferably to open Christmas.Pollock proposes a kind of streamline simulation method of semi analytic in the research of Groundwater Flow, i.e. particles trace is calculated Method, the algorithm are widely used because of its flexibility and universality.In practical application engineering, when survey region is very big, stream Line gauge mould is millions of, and common machines runing time is very long, is not able to satisfy demand of the engineer application to data fructufy when property.
Traditional CPU accelerates frame to be limited to the calculating core number of CPU parallel, and the computing capability of CPU is lower than the meter of GPU Calculation ability.And the GPU quantity on single computer is limited by machine pocket numbers, thus acceleration effect obtained by Limitation.
Summary of the invention
Being designed to provide for the embodiment of the present invention is a kind of based on the more GPU parallel computation particles trace algorithms realizations of multinode The accelerated method of streamline simulation.To solve the problems, such as that it is slow that traditional CPU simulation calculates extensive flow line speed.The present invention is implemented Example has small occupied area, low cost and acceleration effect clear advantage.
In order to solve the above technical problems, present invention offer technical solution is as follows:
The present invention provides a kind of acceleration side that streamline simulation is realized based on the more GPU parallel computation particles trace algorithms of multinode Method, which is characterized in that the method operates on multiple computers or server simultaneously, and each computer or server have Several GPU, the method, comprising:
Step 1: host process (No. 0 process) is responsible for turning to several grids, the base of initialization model for regional model is discrete Each coefficient in this information, aquifer parameter and velocity vector formula, completes grid dividing, and the load between realization process is equal Weighing apparatus;
Step 2: the essential information of broadcast model, aquifer parameter, each coefficient and submodel in velocity vector formula Size is to other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, on the GPU Creation stream opens up GPU memory, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Step 4: each GPU call multiple GPU thread parallels calculate several grid g (ix, iy, iz) physical coordinates p (x, Y, z), start to carry out particles trace process;
Step 5: according to the calculation formula of velocity vector, calculating the flow velocity Vp (Vx, Vy, Vz) of particle;
Step 6: if encountering boundary condition, such as stationary point, then tracing process terminates forward or backward, and executes step 7, no Then, according to flow relocity calculation particle traveling time, and next position of particle is obtained.At this point, completing once to chase after forward or backward Track.Using new position as current particle coordinate, step 5 is continued to execute.
Step 7: being sequentially connected the coordinate points that single GPU thread is calculated, a complete streamline can be constituted.It is single All threads calculating of a GPU finishes, by GPU calculated result from the CPU memory that GPU is transferred to the process.
After step 8:GPU transmission success, using between MPI collective communication mechanism realization process data transmit, by it is each into The calculated result of journey is aggregated into host process.It is output in destination file after finally handling null result data.
Further, in the step 1, given zoning projects to length Lx in x-y plane and width Ly, aqueous Layer important parameter εx、εy, α, β and calculating speed vector formula in coefficient matrix bmn, net is turned to by regional model is discrete Then lattice model completes model partition, the load balancing between realization process.
Further, the essential information of broadcast model, aquifer parameter, each in velocity vector formula in the step 2 A coefficient and sub-grid size are to other processes;
Further, in the step 3, each process being distributed on multiple servers, which is called, has unique designation GPU card number, creation flows, opens up GPU memory on the GPU, and parameter needed for GPU is calculated copies GPU global memory to from CPU In.
Further, in the step 4, each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy, Iz physical coordinates p (x, y, z)) then starts to carry out particles trace process;
Further, the step 5 includes:
The calculation formula of velocity vector:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are to calculate Length and width after region projection to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle.In formula (5), ZmnIndicate the characteristic function of hydraulic Head Distribution, Calculation formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6).
In formula (5) and formula (6), λmnFrom formula (7) calculating.
Further, the step 6 includes:
Boundary condition are as follows: z > 0 or
If being unsatisfactory for boundary condition, a step is tracked according to following equation (8) forward or backward and obtains the new position of particle:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, Δ t table Show a time step.
Further, in the step 7, using the copy function of CUDA, GPU calculated result is transferred back into CPU from GPU On.
Further, in the step 8, the data between the collective communication MPI_Gather function realization process of MPI are used Transmission, the calculated result of each process is aggregated into host process.Knot is output to after finally handling null result data In fruit file.
Further, in the step 1, it is based on MPI concurrent technique, multiple processes are set, are distributed on multiple servers, Realize the parallel of process level.
Further, in the step 3, based on the concurrent technique of CUDA platform, call cudaSetDevice (), The functions such as cudaStreamCreate (), cudaMalloc (), cudaMemcpyAsync () realize specified responsible calculating GPU card number, creation stream open up GPU memory and complete the operation such as data transmission between CPU and GPU.
Further, in the step 3, most-often used parameter is loaded into the most fast register of reading speed;It will count And reading data infrequently maximum according to amount are loaded into the global memory that capacity is maximum but reading speed is slow;By changeless ginseng Number is loaded into most fast and read-only during the running constant memory of reading speeds;By reading position logically relatively close to and read Frequent data are loaded into texture memory.
Further, in the step 4, it is based on CUDA parallel architecture, is invoked at the kernel function executed on GPU, it is real Thread-Level Parallelism in existing process.Each process is all provided with the quantity for determining thread thread in thread block block and block, i.e. setting is parallel The total number of threads of the GPU of acceleration realizes Thread-Level Parallelism;
The invention has the following advantages:
The present invention realizes that the large-scale application program of streamline simulation carries out parallelization processing for particles trace algorithm, realizes Parallel computation of the program based on the more GPU of multinode, and obtain great acceleration effect.
Detailed description of the invention
Fig. 1 is the program flow diagram that the embodiment of the present invention 1 provides.
Fig. 2 is the streamline simulation that provides of the embodiment of the present invention 1 and regional network is formatted schematic diagram;
Fig. 3 is the CUDA thread mode and GPU memory module schematic diagram that the embodiment of the present invention 1 provides;
Fig. 4 is the experimental result picture of the embodiment of the present invention 1;
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.
The present invention provides a kind of acceleration side that streamline simulation is realized based on the more GPU parallel computation particles trace algorithms of multinode Method operates in multiple computers or clothes based on the more GPU of the multinode streamline simulation accelerated method calculated as shown in figures 1-4 simultaneously It is engaged on device, and each computer or server have several GPU, the accelerated method, comprising:
Step 1: host process (No. 0 process) is responsible for turning to several grids, the base of initialization model for regional model is discrete Each coefficient in this information, aquifer parameter and velocity vector formula, completes grid dividing, and the load between realization process is equal Weighing apparatus;
Step 2: the essential information of broadcast model, aquifer parameter, each coefficient and sub-grid in velocity vector formula Size is to other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, on the GPU Creation stream opens up GPU memory, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Step 4: each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy, iz) physical coordinates p (x, Y, z), then start to carry out particles trace process;
Step 5: according to the calculation formula of velocity vector, calculating the flow velocity Vp (Vx, Vy, Vz) of particle;
Step 6: if encountering boundary condition, such as stationary point, then tracing process terminates forward or backward, and executes step 7, no Then, according to flow relocity calculation particle traveling time, and next position of particle is obtained.At this point, completing once to chase after forward or backward Track.Using new position as current particle coordinate, step 5 is continued to execute.
Step 7: being sequentially connected the coordinate points that single GPU thread is calculated, a complete streamline can be constituted.It is single All threads calculating of a GPU finishes, by GPU calculated result from the CPU memory that GPU is transferred to the process.
After step 8:GPU transmission success, using between MPI collective communication mechanism realization process data transmit, by it is each into The calculated result of journey is aggregated into host process.It is output in destination file after finally handling null result data.
The beneficial effects of the present invention are:
The present invention realizes that the large-scale application program of streamline simulation carries out parallelization processing for particles trace algorithm, realizes Parallel computation of the program based on the more GPU of multinode, and obtain great acceleration effect.
Further, in the step 1, given zoning projects to length Lx in x-y plane and width Ly, aqueous Layer important parameter εx、εy, α, β and calculating speed vector formula in coefficient matrix bmn, net is turned to by regional model is discrete Then lattice model completes model partition, the load balancing between realization process.
Further, the essential information of broadcast model, aquifer parameter, each in velocity vector formula in the step 2 A coefficient and sub-grid size are to other processes;
Preferably, in the step 3, each process being distributed on multiple servers calls the GPU with unique designation Card number, creation flows, opens up GPU memory on the GPU, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Further, in the step 4, each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy, Iz physical coordinates p (x, y, z)) then starts to carry out particles trace process;
Preferably, the step 5 includes:
The calculation formula of velocity vector:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are to calculate Length and width after region projection to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle.In formula (5), ZmnIndicate the characteristic function of hydraulic Head Distribution, Calculation formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6).
In formula (5) and formula (6), λmnFrom formula (7) calculating.
Further, the step 6 includes:
Boundary condition are as follows: z > 0 or
If being unsatisfactory for boundary condition, a step is tracked according to following equation (8) forward or backward and obtains the new position of particle:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, Δ t table Show a time step.
Further, in the step 7, using the copy function of CUDA, GPU calculated result is transferred back into CPU from GPU On.
Further, in the step 8, the data between the collective communication MPI_Gather function realization process of MPI are used Transmission, the calculated result of each process is aggregated into host process.Knot is output to after finally handling null result data In fruit file.
In the step 1, it is based on MPI concurrent technique, multiple processes are set, are distributed on multiple servers, realizes process Grade it is parallel.
In the step 3, based on the concurrent technique of CUDA platform, call cudaSetDevice (), The functions such as cudaStreamCreate (), cudaMalloc (), cudaMemcpyAsync () realize specified responsible calculating GPU card number, creation stream open up GPU memory and complete the operation such as data transmission between CPU and GPU.
In the step 3, most-often used parameter is loaded into the most fast register of reading speed;By data volume it is maximum and Data infrequently are read to be loaded into the global memory that capacity is maximum but reading speed is slow;Changeless parameter is loaded into and is read During fastest and operation in read-only constant memory;By reading position logically relatively close to and read frequent data It is loaded into texture memory.
In the step 4, it is based on CUDA parallel architecture, is invoked at the kernel function executed on GPU, realizes process interior lines Journey grade is parallel.Each process is all provided with the quantity for determining thread thread in thread block block and block, that is, sets the GPU accelerated parallel Total number of threads realizes Thread-Level Parallelism;
The programming of CPU-GPU isomery is realized based on MPI+CUDA hybrid technology, using multiple GPU on multiple servers, from And realize more GPU and realize parallel acceleration, further speed up calculating speed.
In the present invention, the characteristics of making full use of GPU various different storage organizations, optimizes data reading mode, will be most-often used Parameter be loaded into the most fast register of reading speed;Data that are data volume is maximum and reading infrequently be loaded into capacity it is maximum but In the slow global memory of reading speed;Changeless parameter is loaded into most fast and read-only during the running constant of reading speed In depositing;By reading position logically relatively close to and read frequent data and be loaded into texture memory.
Embodiment 1:
Below by one embodiment, the present invention is described further, during groundwater of basin streamline simulation, The length that basin region projects to x-y plane is 20000 meters, and width is ten thousand metres, and basin depth capacity is 8000 meters, aqueous Layer parameter is respectively 0,0,1,1, coefficient matrix b={ { 40,0,10,0 }, { 20,0,0,0 }, { 10,0,0,0 }, { 0,0,0,0 } }, Ideal step-length is 6 meters.It is as shown in Figure 1 to the accelerator of the specific streamline simulation in this region.
Step 1: the three-dimensional underground water discrete region of Lx × Ly × Lz is turned to Nx × Ny × Nz by host process (No. 0 process) Grid respectively indicates line number, columns and the number of plies.Wherein, Lx=20000, Ly=10000, Lz=8000, discrete rear Nx=101, Ny=201, Nz=81.Aquifer parameter is respectively α=0, β=0, εx=1, εy=1.Coefficient matrix b value are as follows: b=40, 0,10,0},{20,0,0,0},{10,0,0,0},{0,0,0,0}};, point mode is changed by block in the direction y, and grid model is carried out Even to be divided into n parts, n is the quantity of total process.
Step 2: using the essential information of MPI_Bcast communication functions broadcast model, aquifer parameter, velocity vector formula In each coefficient and sub-grid size to other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, on the GPU Creation stream opens up GPU memory, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Step 4: each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy, iz) physical coordinates p (x, Y, z), then start to carry out particles trace process;
Step 5: the flow velocity of current particle position p (x, y, z) is calculated, according to following calculation formula solving speed not Component Vx, Vy, Vz on equidirectional:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are to calculate Length and width after region projection to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle.In formula (5), ZmnIndicate the characteristic function of hydraulic Head Distribution, Calculation formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6).
In formula (5) and formula (6), λmnFrom formula (7) calculating.
Step 6: if encountering perimeter strip: z > 0 orThen tracking terminates forward or backward, And step 6 is executed, otherwise, time (step-length/rate) is calculated according to current meter, and the next of particle is calculated according to the time Position:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, Δ t table Show a time step.
Step 7: using CUDA copy function by GPU calculated result from the CPU memory that GPU is transferred to the process.
Step 8: being transmitted using the data between MPI collective communication mechanism realization process, the calculated result of each process is converged Always in host process.It is output in destination file after finally handling null result data.
Realize that the accelerated method of streamline simulation operates in multiple bands based on the more GPU parallel computation particles trace algorithms of multinode There is the server of muti-piece GPU card.Specifically, the invention, which calculates core, realizes it by the kernel function of MPI technology combination CUDA More GPU's is parallel on multiple servers (computer).
When realizing the Thread-Level Parallelism of function by CUDA, need to set thread thread in thread block block and block Quantity then sets the total number of threads of the GPU used parallel, realizes Thread-Level Parallelism.This example 1 is by the Thread Count in thread block 32 are set as, thread block is set as two dimensionIn the kernel function executed on specified GPU, pass through ThreadIdx.x+blockIdx.x × blockDim.x+gridDim.x × blockDim.x × blockIdx.y obtains absolute Thread serial number index, the thread are responsible for the tracing process of grid g (ix, iy, iz) interior fluid particle.The execution model of CUDA with The storage model of GPU is as shown in Figure 3.
Data are read between thread block or thread in certain memory, memory includes the local memory of thread oneself and posts It deposits, global memory, constant memory and texture memory between the shared drive and block in thread block in grid.Due to same line Thread in journey block can be read out the data in the shared drive of the thread block, and read and write the speed of shared drive very Fastly, so some common constants are stored in the shared drive of each block by we in such a way that _ _ shared__ is defined In, in this way when repeatedly calling these constant vectors, many time can be saved.It is deposited by using the GPU of different levels Storage structure optimizes data reading performance using redundancy.
Further, by MPI+CUDA hybrid parallel technology, so that the synchronous execution of more GPU in more calculate nodes, The quantity of GPU is expanded, to accelerate to execute streamline simulation calculating process, there is better acceleration effect.
The size of the characteristics of due to MPI_Gather function, the data volume that each process is sent must be consistent, and real In testing, data volume size may be different, therefore, using the method for expanding invalid data, so that the data volume of all processes is all same Maximum value is consistent.After main thread has been collected, the invalid data of expansion is handled.It is finally saved or is printed.
The present embodiment 1 realizes that parallel the speed-up ratio of acquisition is as shown in figure 4, using more on the server of the more GPU of multinode A process calls identical GPU quantity parallel computation, and acceleration effect is significant, and last acceleration effect keeps stable state.
To sum up, the invention has the following advantages:
The present invention realizes that the large-scale application program of streamline simulation carries out parallelization processing for particles trace algorithm, realizes Parallel computation of the program based on the more GPU of multinode, and obtain great acceleration effect.
Although above having used general explanation and specific embodiment, the present invention is described in detail, at this On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.

Claims (11)

1. a kind of accelerated method for being realized streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode, feature are existed In, the method operates on multiple computers or server simultaneously, and each computer or server have several GPU, The method, comprising:
Step 1: host process (No. 0 process) is responsible for turning to several grids, the basic letter of initialization model for regional model is discrete Each coefficient in breath, aquifer parameter and velocity vector formula completes grid dividing, the load balancing between realization process;
Step 2: essential information, aquifer parameter, each coefficient in velocity vector formula and the submodel size of broadcast model To other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, creates on the GPU It flows, open up GPU memory, parameter needed for GPU is calculated copies in GPU global memory from CPU;
Step 4: each GPU call multiple GPU thread parallels calculate several grid g (ix, iy, iz) physical coordinates p (x, y, Z), start to carry out particles trace process;
Step 5: according to the calculation formula of velocity vector, calculating the flow velocity Vp (Vx, Vy, Vz) of particle;
Step 6: if encountering boundary condition, such as stationary point, then tracing process terminates forward or backward, and executes step 7, otherwise, According to flow relocity calculation particle traveling time, and next position of particle is obtained, at this point, complete once to track forward or backward, Using new position as current particle coordinate, step 5 is continued to execute;
Step 7: being sequentially connected the coordinate points that single GPU thread is calculated, a complete streamline can be constituted, individually All threads calculating of GPU finishes, by GPU calculated result from the CPU memory that GPU is transferred to the process;
After step 8:GPU transmission success, transmitted using the data between MPI collective communication mechanism realization process, by each process Calculated result is aggregated into host process, is output in destination file after finally handling null result data.
2. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that in the step 1, given zoning projects to length Lx and width Ly in x-y plane, contains Water layer important parameter εx、εy, α, β and calculating speed vector formula in coefficient matrix bmn, and by regional model discretization For grid model, then host process realizes that grid model divides, and is divided into several submodels, and proof load is balanced.
3. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that in the step 2, broadcast essential information, the aquifer parameter, velocity vector formula of grid model In each coefficient and submodel size to other processes.
4. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that in the step 3, each process being distributed on multiple servers, which is called, has unique designation GPU card number, creation flows, opens up GPU memory on the GPU, and parameter needed for GPU is calculated copies GPU global memory to from CPU In.
5. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that in the step 4, several grid g of each multiple GPU thread parallels calculating of GPU calling (ix, iy, Iz physical coordinates p (x, y, z)) starts to carry out particles trace process.
6. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that the step 5 includes:
The calculation formula of velocity vector:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are that zoning is thrown Length and width after shadow to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle, in formula (5), ZmnIt indicates the characteristic function of hydraulic Head Distribution, calculates Formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6),
In formula (5) and formula (6), λmnFrom formula (7) calculating.
7. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that the step 6 includes:
Boundary condition are as follows: z > 0 or
If being unsatisfactory for boundary condition, a step is tracked according to following equation (8) forward or backward:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, and Δ t indicates one A time step.
8. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that the step 7 includes: the copy function using CUDA, and calculated result is transferred back to CPU from GPU On.
9. according to claim 3 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that by the parallel architecture based on message transmission, i.e. MPI concurrent technique, starting is distributed in more calculating Multiple processes on machine, each process are responsible for a part of streamlined impeller process, realize that process level is parallel.
10. according to claim 3 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that most-often used parameter is loaded into the most fast register of reading speed;By data volume maximum and read Data infrequently are taken to be loaded into the global memory that capacity is maximum but reading speed is slow;Changeless parameter is loaded into and reads speed During spending most fast and operation in read-only constant memory;By reading position logically relatively close to and read frequent data and carry Enter in texture memory.
11. according to claim 4 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode Fast method, which is characterized in that one piece of GPU of each Process flowchart realizes the parallel computation of task, using CUDA parallel architecture, setting The quantity of thread thread in thread block block and block, that is, set the total number of threads of the GPU accelerated parallel, inside realization process Thread-Level Parallelism.
CN201811574392.7A 2018-12-21 2018-12-21 A kind of streamline simulation accelerated method calculated based on the more GPU of multinode Pending CN109857543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811574392.7A CN109857543A (en) 2018-12-21 2018-12-21 A kind of streamline simulation accelerated method calculated based on the more GPU of multinode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811574392.7A CN109857543A (en) 2018-12-21 2018-12-21 A kind of streamline simulation accelerated method calculated based on the more GPU of multinode

Publications (1)

Publication Number Publication Date
CN109857543A true CN109857543A (en) 2019-06-07

Family

ID=66891995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811574392.7A Pending CN109857543A (en) 2018-12-21 2018-12-21 A kind of streamline simulation accelerated method calculated based on the more GPU of multinode

Country Status (1)

Country Link
CN (1) CN109857543A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796701A (en) * 2019-10-21 2020-02-14 深圳市瑞立视多媒体科技有限公司 Identification method, device and equipment of mark points and storage medium
CN111186139A (en) * 2019-12-25 2020-05-22 西北工业大学 Multi-level parallel slicing method for 3D printing model
CN111552478A (en) * 2020-04-30 2020-08-18 上海商汤智能科技有限公司 Apparatus, method and storage medium for generating CUDA program
CN112148437A (en) * 2020-10-21 2020-12-29 深圳致星科技有限公司 Calculation task acceleration processing method, device and equipment for federal learning
CN112257313A (en) * 2020-10-21 2021-01-22 西安理工大学 Pollutant transport high-resolution numerical simulation method based on GPU acceleration
CN112380793A (en) * 2020-11-18 2021-02-19 上海交通大学 Turbulence combustion numerical simulation parallel acceleration implementation method based on GPU
CN112947870A (en) * 2021-01-21 2021-06-11 西北工业大学 G-code parallel generation method of 3D printing model
CN113660046A (en) * 2021-08-17 2021-11-16 东南大学 Method for accelerating generation of large-scale wireless channel coefficients
CN114490011A (en) * 2020-11-12 2022-05-13 上海交通大学 Parallel acceleration implementation method of N-body simulation in heterogeneous architecture
CN114970395A (en) * 2022-06-10 2022-08-30 青岛大学 Large-scale fluid simulation method and system based on two-dimensional Saint-Vietnam equation
CN117687779A (en) * 2023-11-30 2024-03-12 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform
CN118502964A (en) * 2024-07-12 2024-08-16 安徽大学 Tokamak new classical circumferential viscous torque CUDA simulation implementation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425523A (en) * 2013-06-20 2013-12-04 国家电网公司 Parallel computing system and method of PMU (Phasor Measurement Unit) online application system
CN104036031A (en) * 2014-06-27 2014-09-10 北京航空航天大学 Large-scale CFD parallel computing method based on distributed Mysql cluster storage
CN104714850A (en) * 2015-03-02 2015-06-17 心医国际数字医疗系统(大连)有限公司 Heterogeneous joint account balance method based on OPENCL
CN107515987A (en) * 2017-08-25 2017-12-26 中国地质大学(北京) The simulation accelerated method of Groundwater Flow based on more relaxation Lattice Boltzmann models
CN108427605A (en) * 2018-02-09 2018-08-21 中国地质大学(北京) The accelerated method of streamline simulation is realized based on particles trace algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425523A (en) * 2013-06-20 2013-12-04 国家电网公司 Parallel computing system and method of PMU (Phasor Measurement Unit) online application system
CN104036031A (en) * 2014-06-27 2014-09-10 北京航空航天大学 Large-scale CFD parallel computing method based on distributed Mysql cluster storage
CN104714850A (en) * 2015-03-02 2015-06-17 心医国际数字医疗系统(大连)有限公司 Heterogeneous joint account balance method based on OPENCL
CN107515987A (en) * 2017-08-25 2017-12-26 中国地质大学(北京) The simulation accelerated method of Groundwater Flow based on more relaxation Lattice Boltzmann models
CN108427605A (en) * 2018-02-09 2018-08-21 中国地质大学(北京) The accelerated method of streamline simulation is realized based on particles trace algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李丹丹: "地下水流动空间数据并行计算的研究", 《中国博士学位论文全文数据库 基础科学辑》 *
李安平: "基于CUDA的并行图像处理问题研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
贾永红: "《数字图像处理实习教程》", 31 January 2007 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796701B (en) * 2019-10-21 2022-06-07 深圳市瑞立视多媒体科技有限公司 Identification method, device and equipment of mark points and storage medium
CN110796701A (en) * 2019-10-21 2020-02-14 深圳市瑞立视多媒体科技有限公司 Identification method, device and equipment of mark points and storage medium
CN111186139A (en) * 2019-12-25 2020-05-22 西北工业大学 Multi-level parallel slicing method for 3D printing model
CN111186139B (en) * 2019-12-25 2022-03-15 西北工业大学 Multi-level parallel slicing method for 3D printing model
CN111552478A (en) * 2020-04-30 2020-08-18 上海商汤智能科技有限公司 Apparatus, method and storage medium for generating CUDA program
CN111552478B (en) * 2020-04-30 2024-03-22 上海商汤智能科技有限公司 Apparatus, method and storage medium for generating CUDA program
CN112148437A (en) * 2020-10-21 2020-12-29 深圳致星科技有限公司 Calculation task acceleration processing method, device and equipment for federal learning
CN112257313A (en) * 2020-10-21 2021-01-22 西安理工大学 Pollutant transport high-resolution numerical simulation method based on GPU acceleration
CN112257313B (en) * 2020-10-21 2024-05-14 西安理工大学 GPU acceleration-based high-resolution numerical simulation method for pollutant transportation
CN112148437B (en) * 2020-10-21 2022-04-01 深圳致星科技有限公司 Calculation task acceleration processing method, device and equipment for federal learning
CN114490011A (en) * 2020-11-12 2022-05-13 上海交通大学 Parallel acceleration implementation method of N-body simulation in heterogeneous architecture
CN112380793A (en) * 2020-11-18 2021-02-19 上海交通大学 Turbulence combustion numerical simulation parallel acceleration implementation method based on GPU
CN112380793B (en) * 2020-11-18 2024-02-13 上海交通大学 GPU-based turbulence combustion numerical simulation parallel acceleration implementation method
CN112947870B (en) * 2021-01-21 2022-12-30 西北工业大学 G-code parallel generation method of 3D printing model
CN112947870A (en) * 2021-01-21 2021-06-11 西北工业大学 G-code parallel generation method of 3D printing model
CN113660046B (en) * 2021-08-17 2022-11-11 东南大学 Method for accelerating generation of large-scale wireless channel coefficients
CN113660046A (en) * 2021-08-17 2021-11-16 东南大学 Method for accelerating generation of large-scale wireless channel coefficients
CN114970395A (en) * 2022-06-10 2022-08-30 青岛大学 Large-scale fluid simulation method and system based on two-dimensional Saint-Vietnam equation
CN117687779A (en) * 2023-11-30 2024-03-12 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform
CN117687779B (en) * 2023-11-30 2024-04-26 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform
CN118502964A (en) * 2024-07-12 2024-08-16 安徽大学 Tokamak new classical circumferential viscous torque CUDA simulation implementation method

Similar Documents

Publication Publication Date Title
CN109857543A (en) A kind of streamline simulation accelerated method calculated based on the more GPU of multinode
CN103970960B (en) The element-free Galerkin structural topological optimization method accelerated parallel based on GPU
Brodtkorb et al. Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation
CN103765376B (en) Graphic process unit with clog-free parallel architecture
CN101727653B (en) Graphics processing unit based discrete simulation computation method of multicomponent system
CN103440163B (en) Use the accelerator emulation mode based on PIC model of GPU Parallel Implementation
CN106021828A (en) Fluid simulation method based on grid-boltzmann model
US11145099B2 (en) Computerized rendering of objects having anisotropic elastoplasticity for codimensional frictional contact
CN106547627A (en) The method and system that a kind of Spark MLlib data processings accelerate
CN103345580B (en) Based on the parallel CFD method of lattice Boltzmann method
CN109146067A (en) A kind of Policy convolutional neural networks accelerator based on FPGA
CN104360896A (en) Parallel fluid simulation acceleration method based on GPU (Graphics Processing Unit) cluster
KR100914869B1 (en) System and Method for Real-Time Cloth Simulation
CN104392147A (en) Region scale soil erosion modeling-oriented terrain factor parallel computing method
CN111445003A (en) Neural network generator
US20190318533A1 (en) Realism of scenes involving water surfaces during rendering
CN107016180A (en) A kind of particle flow emulation mode
CN107025332A (en) A kind of microcosmic water diffusion process method for visualizing of fabric face based on SPH
JPH11502958A (en) Collision calculation for physical process simulation
Wang et al. FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications
CN112100939B (en) Real-time fluid simulation method and system based on computer loader
CN108427605B (en) Acceleration method for realizing streamline simulation based on particle tracking algorithm
CN106373192B (en) A kind of non-topological coherence three-dimensional grid block tracing algorithm
CN109949398A (en) Particle renders method, apparatus and electronic equipment
Amador et al. CUDA-based linear solvers for stable fluids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607