CN109857543A - A kind of streamline simulation accelerated method calculated based on the more GPU of multinode - Google Patents
A kind of streamline simulation accelerated method calculated based on the more GPU of multinode Download PDFInfo
- Publication number
- CN109857543A CN109857543A CN201811574392.7A CN201811574392A CN109857543A CN 109857543 A CN109857543 A CN 109857543A CN 201811574392 A CN201811574392 A CN 201811574392A CN 109857543 A CN109857543 A CN 109857543A
- Authority
- CN
- China
- Prior art keywords
- gpu
- formula
- multinode
- streamline
- parallel computation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Multi Processors (AREA)
Abstract
The invention discloses a kind of accelerated methods that streamline simulation is realized based on the more GPU parallel computation particles trace algorithms of multinode, belong to streamline numerical simulation field.This method operates on several computers or server with multiple GPU.This method comprises: zoning is discrete for several grids;The essential information of host process initialization model, aquifer parameter, coefficient matrix and grid division model in velocity vector formula, and be broadcasted in other processes;Each process calls one piece of GPU, and stream is created on the GPU and is accelerated, GPU memory headroom storing data is opened up;Each GPU starts several threads, the next position and a upper position of per thread according to current location using particles trace algorithm calculating particle forwardly and rearwardly, iterate the process, obtains a complete streamline, then passes GPU data in the CPU memory for distributing to current process back.The calculated result for finally being summarized each process using MPI collective communication function, after handling invalid data, is output in destination file.The present invention makes full use of the GPU resource on cluster system resource and node, realizes the depth parallel computation of extensive streamline.It is significant with acceleration effect, the advantages of quickly generating streamline.
Description
Technical field
The present invention relates to streamline numerical simulation fields, particularly relate to one kind and are chased after based on the more GPU parallel computation particles of multinode
Track algorithm is to realize the accelerated method of streamline simulation
Background technique
Streamline simulation technology is not only that study of groundwater provides image, intuitive ground water movement state information, to study week
Hydrological characteristics are enclosed, and provide oil-layer characteristic and production performance information for the test of reservoir engineering inter-well tracer test, it is convenient preferably to open
Christmas.Pollock proposes a kind of streamline simulation method of semi analytic in the research of Groundwater Flow, i.e. particles trace is calculated
Method, the algorithm are widely used because of its flexibility and universality.In practical application engineering, when survey region is very big, stream
Line gauge mould is millions of, and common machines runing time is very long, is not able to satisfy demand of the engineer application to data fructufy when property.
Traditional CPU accelerates frame to be limited to the calculating core number of CPU parallel, and the computing capability of CPU is lower than the meter of GPU
Calculation ability.And the GPU quantity on single computer is limited by machine pocket numbers, thus acceleration effect obtained by
Limitation.
Summary of the invention
Being designed to provide for the embodiment of the present invention is a kind of based on the more GPU parallel computation particles trace algorithms realizations of multinode
The accelerated method of streamline simulation.To solve the problems, such as that it is slow that traditional CPU simulation calculates extensive flow line speed.The present invention is implemented
Example has small occupied area, low cost and acceleration effect clear advantage.
In order to solve the above technical problems, present invention offer technical solution is as follows:
The present invention provides a kind of acceleration side that streamline simulation is realized based on the more GPU parallel computation particles trace algorithms of multinode
Method, which is characterized in that the method operates on multiple computers or server simultaneously, and each computer or server have
Several GPU, the method, comprising:
Step 1: host process (No. 0 process) is responsible for turning to several grids, the base of initialization model for regional model is discrete
Each coefficient in this information, aquifer parameter and velocity vector formula, completes grid dividing, and the load between realization process is equal
Weighing apparatus;
Step 2: the essential information of broadcast model, aquifer parameter, each coefficient and submodel in velocity vector formula
Size is to other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, on the GPU
Creation stream opens up GPU memory, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Step 4: each GPU call multiple GPU thread parallels calculate several grid g (ix, iy, iz) physical coordinates p (x,
Y, z), start to carry out particles trace process;
Step 5: according to the calculation formula of velocity vector, calculating the flow velocity Vp (Vx, Vy, Vz) of particle;
Step 6: if encountering boundary condition, such as stationary point, then tracing process terminates forward or backward, and executes step 7, no
Then, according to flow relocity calculation particle traveling time, and next position of particle is obtained.At this point, completing once to chase after forward or backward
Track.Using new position as current particle coordinate, step 5 is continued to execute.
Step 7: being sequentially connected the coordinate points that single GPU thread is calculated, a complete streamline can be constituted.It is single
All threads calculating of a GPU finishes, by GPU calculated result from the CPU memory that GPU is transferred to the process.
After step 8:GPU transmission success, using between MPI collective communication mechanism realization process data transmit, by it is each into
The calculated result of journey is aggregated into host process.It is output in destination file after finally handling null result data.
Further, in the step 1, given zoning projects to length Lx in x-y plane and width Ly, aqueous
Layer important parameter εx、εy, α, β and calculating speed vector formula in coefficient matrix bmn, net is turned to by regional model is discrete
Then lattice model completes model partition, the load balancing between realization process.
Further, the essential information of broadcast model, aquifer parameter, each in velocity vector formula in the step 2
A coefficient and sub-grid size are to other processes;
Further, in the step 3, each process being distributed on multiple servers, which is called, has unique designation
GPU card number, creation flows, opens up GPU memory on the GPU, and parameter needed for GPU is calculated copies GPU global memory to from CPU
In.
Further, in the step 4, each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy,
Iz physical coordinates p (x, y, z)) then starts to carry out particles trace process;
Further, the step 5 includes:
The calculation formula of velocity vector:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are to calculate
Length and width after region projection to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle.In formula (5), ZmnIndicate the characteristic function of hydraulic Head Distribution,
Calculation formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6).
In formula (5) and formula (6), λmnFrom formula (7) calculating.
Further, the step 6 includes:
Boundary condition are as follows: z > 0 or
If being unsatisfactory for boundary condition, a step is tracked according to following equation (8) forward or backward and obtains the new position of particle:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, Δ t table
Show a time step.
Further, in the step 7, using the copy function of CUDA, GPU calculated result is transferred back into CPU from GPU
On.
Further, in the step 8, the data between the collective communication MPI_Gather function realization process of MPI are used
Transmission, the calculated result of each process is aggregated into host process.Knot is output to after finally handling null result data
In fruit file.
Further, in the step 1, it is based on MPI concurrent technique, multiple processes are set, are distributed on multiple servers,
Realize the parallel of process level.
Further, in the step 3, based on the concurrent technique of CUDA platform, call cudaSetDevice (),
The functions such as cudaStreamCreate (), cudaMalloc (), cudaMemcpyAsync () realize specified responsible calculating
GPU card number, creation stream open up GPU memory and complete the operation such as data transmission between CPU and GPU.
Further, in the step 3, most-often used parameter is loaded into the most fast register of reading speed;It will count
And reading data infrequently maximum according to amount are loaded into the global memory that capacity is maximum but reading speed is slow;By changeless ginseng
Number is loaded into most fast and read-only during the running constant memory of reading speeds;By reading position logically relatively close to and read
Frequent data are loaded into texture memory.
Further, in the step 4, it is based on CUDA parallel architecture, is invoked at the kernel function executed on GPU, it is real
Thread-Level Parallelism in existing process.Each process is all provided with the quantity for determining thread thread in thread block block and block, i.e. setting is parallel
The total number of threads of the GPU of acceleration realizes Thread-Level Parallelism;
The invention has the following advantages:
The present invention realizes that the large-scale application program of streamline simulation carries out parallelization processing for particles trace algorithm, realizes
Parallel computation of the program based on the more GPU of multinode, and obtain great acceleration effect.
Detailed description of the invention
Fig. 1 is the program flow diagram that the embodiment of the present invention 1 provides.
Fig. 2 is the streamline simulation that provides of the embodiment of the present invention 1 and regional network is formatted schematic diagram;
Fig. 3 is the CUDA thread mode and GPU memory module schematic diagram that the embodiment of the present invention 1 provides;
Fig. 4 is the experimental result picture of the embodiment of the present invention 1;
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
The present invention provides a kind of acceleration side that streamline simulation is realized based on the more GPU parallel computation particles trace algorithms of multinode
Method operates in multiple computers or clothes based on the more GPU of the multinode streamline simulation accelerated method calculated as shown in figures 1-4 simultaneously
It is engaged on device, and each computer or server have several GPU, the accelerated method, comprising:
Step 1: host process (No. 0 process) is responsible for turning to several grids, the base of initialization model for regional model is discrete
Each coefficient in this information, aquifer parameter and velocity vector formula, completes grid dividing, and the load between realization process is equal
Weighing apparatus;
Step 2: the essential information of broadcast model, aquifer parameter, each coefficient and sub-grid in velocity vector formula
Size is to other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, on the GPU
Creation stream opens up GPU memory, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Step 4: each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy, iz) physical coordinates p (x,
Y, z), then start to carry out particles trace process;
Step 5: according to the calculation formula of velocity vector, calculating the flow velocity Vp (Vx, Vy, Vz) of particle;
Step 6: if encountering boundary condition, such as stationary point, then tracing process terminates forward or backward, and executes step 7, no
Then, according to flow relocity calculation particle traveling time, and next position of particle is obtained.At this point, completing once to chase after forward or backward
Track.Using new position as current particle coordinate, step 5 is continued to execute.
Step 7: being sequentially connected the coordinate points that single GPU thread is calculated, a complete streamline can be constituted.It is single
All threads calculating of a GPU finishes, by GPU calculated result from the CPU memory that GPU is transferred to the process.
After step 8:GPU transmission success, using between MPI collective communication mechanism realization process data transmit, by it is each into
The calculated result of journey is aggregated into host process.It is output in destination file after finally handling null result data.
The beneficial effects of the present invention are:
The present invention realizes that the large-scale application program of streamline simulation carries out parallelization processing for particles trace algorithm, realizes
Parallel computation of the program based on the more GPU of multinode, and obtain great acceleration effect.
Further, in the step 1, given zoning projects to length Lx in x-y plane and width Ly, aqueous
Layer important parameter εx、εy, α, β and calculating speed vector formula in coefficient matrix bmn, net is turned to by regional model is discrete
Then lattice model completes model partition, the load balancing between realization process.
Further, the essential information of broadcast model, aquifer parameter, each in velocity vector formula in the step 2
A coefficient and sub-grid size are to other processes;
Preferably, in the step 3, each process being distributed on multiple servers calls the GPU with unique designation
Card number, creation flows, opens up GPU memory on the GPU, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Further, in the step 4, each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy,
Iz physical coordinates p (x, y, z)) then starts to carry out particles trace process;
Preferably, the step 5 includes:
The calculation formula of velocity vector:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are to calculate
Length and width after region projection to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle.In formula (5), ZmnIndicate the characteristic function of hydraulic Head Distribution,
Calculation formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6).
In formula (5) and formula (6), λmnFrom formula (7) calculating.
Further, the step 6 includes:
Boundary condition are as follows: z > 0 or
If being unsatisfactory for boundary condition, a step is tracked according to following equation (8) forward or backward and obtains the new position of particle:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, Δ t table
Show a time step.
Further, in the step 7, using the copy function of CUDA, GPU calculated result is transferred back into CPU from GPU
On.
Further, in the step 8, the data between the collective communication MPI_Gather function realization process of MPI are used
Transmission, the calculated result of each process is aggregated into host process.Knot is output to after finally handling null result data
In fruit file.
In the step 1, it is based on MPI concurrent technique, multiple processes are set, are distributed on multiple servers, realizes process
Grade it is parallel.
In the step 3, based on the concurrent technique of CUDA platform, call cudaSetDevice (),
The functions such as cudaStreamCreate (), cudaMalloc (), cudaMemcpyAsync () realize specified responsible calculating
GPU card number, creation stream open up GPU memory and complete the operation such as data transmission between CPU and GPU.
In the step 3, most-often used parameter is loaded into the most fast register of reading speed;By data volume it is maximum and
Data infrequently are read to be loaded into the global memory that capacity is maximum but reading speed is slow;Changeless parameter is loaded into and is read
During fastest and operation in read-only constant memory;By reading position logically relatively close to and read frequent data
It is loaded into texture memory.
In the step 4, it is based on CUDA parallel architecture, is invoked at the kernel function executed on GPU, realizes process interior lines
Journey grade is parallel.Each process is all provided with the quantity for determining thread thread in thread block block and block, that is, sets the GPU accelerated parallel
Total number of threads realizes Thread-Level Parallelism;
The programming of CPU-GPU isomery is realized based on MPI+CUDA hybrid technology, using multiple GPU on multiple servers, from
And realize more GPU and realize parallel acceleration, further speed up calculating speed.
In the present invention, the characteristics of making full use of GPU various different storage organizations, optimizes data reading mode, will be most-often used
Parameter be loaded into the most fast register of reading speed;Data that are data volume is maximum and reading infrequently be loaded into capacity it is maximum but
In the slow global memory of reading speed;Changeless parameter is loaded into most fast and read-only during the running constant of reading speed
In depositing;By reading position logically relatively close to and read frequent data and be loaded into texture memory.
Embodiment 1:
Below by one embodiment, the present invention is described further, during groundwater of basin streamline simulation,
The length that basin region projects to x-y plane is 20000 meters, and width is ten thousand metres, and basin depth capacity is 8000 meters, aqueous
Layer parameter is respectively 0,0,1,1, coefficient matrix b={ { 40,0,10,0 }, { 20,0,0,0 }, { 10,0,0,0 }, { 0,0,0,0 } },
Ideal step-length is 6 meters.It is as shown in Figure 1 to the accelerator of the specific streamline simulation in this region.
Step 1: the three-dimensional underground water discrete region of Lx × Ly × Lz is turned to Nx × Ny × Nz by host process (No. 0 process)
Grid respectively indicates line number, columns and the number of plies.Wherein, Lx=20000, Ly=10000, Lz=8000, discrete rear Nx=101,
Ny=201, Nz=81.Aquifer parameter is respectively α=0, β=0, εx=1, εy=1.Coefficient matrix b value are as follows: b=40,
0,10,0},{20,0,0,0},{10,0,0,0},{0,0,0,0}};, point mode is changed by block in the direction y, and grid model is carried out
Even to be divided into n parts, n is the quantity of total process.
Step 2: using the essential information of MPI_Bcast communication functions broadcast model, aquifer parameter, velocity vector formula
In each coefficient and sub-grid size to other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, on the GPU
Creation stream opens up GPU memory, and parameter needed for GPU is calculated copies in GPU global memory from CPU.
Step 4: each GPU call multiple GPU threads calculate simultaneously several grid g (ix, iy, iz) physical coordinates p (x,
Y, z), then start to carry out particles trace process;
Step 5: the flow velocity of current particle position p (x, y, z) is calculated, according to following calculation formula solving speed not
Component Vx, Vy, Vz on equidirectional:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are to calculate
Length and width after region projection to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle.In formula (5), ZmnIndicate the characteristic function of hydraulic Head Distribution,
Calculation formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6).
In formula (5) and formula (6), λmnFrom formula (7) calculating.
Step 6: if encountering perimeter strip: z > 0 orThen tracking terminates forward or backward,
And step 6 is executed, otherwise, time (step-length/rate) is calculated according to current meter, and the next of particle is calculated according to the time
Position:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, Δ t table
Show a time step.
Step 7: using CUDA copy function by GPU calculated result from the CPU memory that GPU is transferred to the process.
Step 8: being transmitted using the data between MPI collective communication mechanism realization process, the calculated result of each process is converged
Always in host process.It is output in destination file after finally handling null result data.
Realize that the accelerated method of streamline simulation operates in multiple bands based on the more GPU parallel computation particles trace algorithms of multinode
There is the server of muti-piece GPU card.Specifically, the invention, which calculates core, realizes it by the kernel function of MPI technology combination CUDA
More GPU's is parallel on multiple servers (computer).
When realizing the Thread-Level Parallelism of function by CUDA, need to set thread thread in thread block block and block
Quantity then sets the total number of threads of the GPU used parallel, realizes Thread-Level Parallelism.This example 1 is by the Thread Count in thread block
32 are set as, thread block is set as two dimensionIn the kernel function executed on specified GPU, pass through
ThreadIdx.x+blockIdx.x × blockDim.x+gridDim.x × blockDim.x × blockIdx.y obtains absolute
Thread serial number index, the thread are responsible for the tracing process of grid g (ix, iy, iz) interior fluid particle.The execution model of CUDA with
The storage model of GPU is as shown in Figure 3.
Data are read between thread block or thread in certain memory, memory includes the local memory of thread oneself and posts
It deposits, global memory, constant memory and texture memory between the shared drive and block in thread block in grid.Due to same line
Thread in journey block can be read out the data in the shared drive of the thread block, and read and write the speed of shared drive very
Fastly, so some common constants are stored in the shared drive of each block by we in such a way that _ _ shared__ is defined
In, in this way when repeatedly calling these constant vectors, many time can be saved.It is deposited by using the GPU of different levels
Storage structure optimizes data reading performance using redundancy.
Further, by MPI+CUDA hybrid parallel technology, so that the synchronous execution of more GPU in more calculate nodes,
The quantity of GPU is expanded, to accelerate to execute streamline simulation calculating process, there is better acceleration effect.
The size of the characteristics of due to MPI_Gather function, the data volume that each process is sent must be consistent, and real
In testing, data volume size may be different, therefore, using the method for expanding invalid data, so that the data volume of all processes is all same
Maximum value is consistent.After main thread has been collected, the invalid data of expansion is handled.It is finally saved or is printed.
The present embodiment 1 realizes that parallel the speed-up ratio of acquisition is as shown in figure 4, using more on the server of the more GPU of multinode
A process calls identical GPU quantity parallel computation, and acceleration effect is significant, and last acceleration effect keeps stable state.
To sum up, the invention has the following advantages:
The present invention realizes that the large-scale application program of streamline simulation carries out parallelization processing for particles trace algorithm, realizes
Parallel computation of the program based on the more GPU of multinode, and obtain great acceleration effect.
Although above having used general explanation and specific embodiment, the present invention is described in detail, at this
On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore,
These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.
Claims (11)
1. a kind of accelerated method for being realized streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode, feature are existed
In, the method operates on multiple computers or server simultaneously, and each computer or server have several GPU,
The method, comprising:
Step 1: host process (No. 0 process) is responsible for turning to several grids, the basic letter of initialization model for regional model is discrete
Each coefficient in breath, aquifer parameter and velocity vector formula completes grid dividing, the load balancing between realization process;
Step 2: essential information, aquifer parameter, each coefficient in velocity vector formula and the submodel size of broadcast model
To other processes;
Step 3: each process being distributed on multiple servers calls the GPU card number with unique designation, creates on the GPU
It flows, open up GPU memory, parameter needed for GPU is calculated copies in GPU global memory from CPU;
Step 4: each GPU call multiple GPU thread parallels calculate several grid g (ix, iy, iz) physical coordinates p (x, y,
Z), start to carry out particles trace process;
Step 5: according to the calculation formula of velocity vector, calculating the flow velocity Vp (Vx, Vy, Vz) of particle;
Step 6: if encountering boundary condition, such as stationary point, then tracing process terminates forward or backward, and executes step 7, otherwise,
According to flow relocity calculation particle traveling time, and next position of particle is obtained, at this point, complete once to track forward or backward,
Using new position as current particle coordinate, step 5 is continued to execute;
Step 7: being sequentially connected the coordinate points that single GPU thread is calculated, a complete streamline can be constituted, individually
All threads calculating of GPU finishes, by GPU calculated result from the CPU memory that GPU is transferred to the process;
After step 8:GPU transmission success, transmitted using the data between MPI collective communication mechanism realization process, by each process
Calculated result is aggregated into host process, is output in destination file after finally handling null result data.
2. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that in the step 1, given zoning projects to length Lx and width Ly in x-y plane, contains
Water layer important parameter εx、εy, α, β and calculating speed vector formula in coefficient matrix bmn, and by regional model discretization
For grid model, then host process realizes that grid model divides, and is divided into several submodels, and proof load is balanced.
3. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that in the step 2, broadcast essential information, the aquifer parameter, velocity vector formula of grid model
In each coefficient and submodel size to other processes.
4. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that in the step 3, each process being distributed on multiple servers, which is called, has unique designation
GPU card number, creation flows, opens up GPU memory on the GPU, and parameter needed for GPU is calculated copies GPU global memory to from CPU
In.
5. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that in the step 4, several grid g of each multiple GPU thread parallels calculating of GPU calling (ix, iy,
Iz physical coordinates p (x, y, z)) starts to carry out particles trace process.
6. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that the step 5 includes:
The calculation formula of velocity vector:
Formula (1), (2), in (3), εx、εy, α, β be water-bearing layer important parameter, bmnIt is coefficient matrix, Lx, Ly are that zoning is thrown
Length and width after shadow to x-y plane, xp、ypAnd zpCalculation formula it is as follows:
In formula (4), x, y, z is the physical coordinates of particle, in formula (5), ZmnIt indicates the characteristic function of hydraulic Head Distribution, calculates
Formula is as follows:
WmnThe characteristic function for indicating vertical flow velocity, as shown in formula (6),
In formula (5) and formula (6), λmnFrom formula (7) calculating.
7. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that the step 6 includes:
Boundary condition are as follows: z > 0 or
If being unsatisfactory for boundary condition, a step is tracked according to following equation (8) forward or backward:
In formula (8), the value of DIR is 1 or -1,1 expression one step of Forward Trace, and -1 indicates one step of backward tracing, and Δ t indicates one
A time step.
8. according to claim 1 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that the step 7 includes: the copy function using CUDA, and calculated result is transferred back to CPU from GPU
On.
9. according to claim 3 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that by the parallel architecture based on message transmission, i.e. MPI concurrent technique, starting is distributed in more calculating
Multiple processes on machine, each process are responsible for a part of streamlined impeller process, realize that process level is parallel.
10. according to claim 3 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that most-often used parameter is loaded into the most fast register of reading speed;By data volume maximum and read
Data infrequently are taken to be loaded into the global memory that capacity is maximum but reading speed is slow;Changeless parameter is loaded into and reads speed
During spending most fast and operation in read-only constant memory;By reading position logically relatively close to and read frequent data and carry
Enter in texture memory.
11. according to claim 4 realize adding for streamline simulation based on the more GPU parallel computation particles trace algorithms of multinode
Fast method, which is characterized in that one piece of GPU of each Process flowchart realizes the parallel computation of task, using CUDA parallel architecture, setting
The quantity of thread thread in thread block block and block, that is, set the total number of threads of the GPU accelerated parallel, inside realization process
Thread-Level Parallelism.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811574392.7A CN109857543A (en) | 2018-12-21 | 2018-12-21 | A kind of streamline simulation accelerated method calculated based on the more GPU of multinode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811574392.7A CN109857543A (en) | 2018-12-21 | 2018-12-21 | A kind of streamline simulation accelerated method calculated based on the more GPU of multinode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109857543A true CN109857543A (en) | 2019-06-07 |
Family
ID=66891995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811574392.7A Pending CN109857543A (en) | 2018-12-21 | 2018-12-21 | A kind of streamline simulation accelerated method calculated based on the more GPU of multinode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109857543A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110796701A (en) * | 2019-10-21 | 2020-02-14 | 深圳市瑞立视多媒体科技有限公司 | Identification method, device and equipment of mark points and storage medium |
CN111186139A (en) * | 2019-12-25 | 2020-05-22 | 西北工业大学 | Multi-level parallel slicing method for 3D printing model |
CN111552478A (en) * | 2020-04-30 | 2020-08-18 | 上海商汤智能科技有限公司 | Apparatus, method and storage medium for generating CUDA program |
CN112148437A (en) * | 2020-10-21 | 2020-12-29 | 深圳致星科技有限公司 | Calculation task acceleration processing method, device and equipment for federal learning |
CN112257313A (en) * | 2020-10-21 | 2021-01-22 | 西安理工大学 | Pollutant transport high-resolution numerical simulation method based on GPU acceleration |
CN112380793A (en) * | 2020-11-18 | 2021-02-19 | 上海交通大学 | Turbulence combustion numerical simulation parallel acceleration implementation method based on GPU |
CN112947870A (en) * | 2021-01-21 | 2021-06-11 | 西北工业大学 | G-code parallel generation method of 3D printing model |
CN113660046A (en) * | 2021-08-17 | 2021-11-16 | 东南大学 | Method for accelerating generation of large-scale wireless channel coefficients |
CN114490011A (en) * | 2020-11-12 | 2022-05-13 | 上海交通大学 | Parallel acceleration implementation method of N-body simulation in heterogeneous architecture |
CN114970395A (en) * | 2022-06-10 | 2022-08-30 | 青岛大学 | Large-scale fluid simulation method and system based on two-dimensional Saint-Vietnam equation |
CN117687779A (en) * | 2023-11-30 | 2024-03-12 | 山东诚泉信息科技有限责任公司 | Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform |
CN118502964A (en) * | 2024-07-12 | 2024-08-16 | 安徽大学 | Tokamak new classical circumferential viscous torque CUDA simulation implementation method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425523A (en) * | 2013-06-20 | 2013-12-04 | 国家电网公司 | Parallel computing system and method of PMU (Phasor Measurement Unit) online application system |
CN104036031A (en) * | 2014-06-27 | 2014-09-10 | 北京航空航天大学 | Large-scale CFD parallel computing method based on distributed Mysql cluster storage |
CN104714850A (en) * | 2015-03-02 | 2015-06-17 | 心医国际数字医疗系统(大连)有限公司 | Heterogeneous joint account balance method based on OPENCL |
CN107515987A (en) * | 2017-08-25 | 2017-12-26 | 中国地质大学(北京) | The simulation accelerated method of Groundwater Flow based on more relaxation Lattice Boltzmann models |
CN108427605A (en) * | 2018-02-09 | 2018-08-21 | 中国地质大学(北京) | The accelerated method of streamline simulation is realized based on particles trace algorithm |
-
2018
- 2018-12-21 CN CN201811574392.7A patent/CN109857543A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425523A (en) * | 2013-06-20 | 2013-12-04 | 国家电网公司 | Parallel computing system and method of PMU (Phasor Measurement Unit) online application system |
CN104036031A (en) * | 2014-06-27 | 2014-09-10 | 北京航空航天大学 | Large-scale CFD parallel computing method based on distributed Mysql cluster storage |
CN104714850A (en) * | 2015-03-02 | 2015-06-17 | 心医国际数字医疗系统(大连)有限公司 | Heterogeneous joint account balance method based on OPENCL |
CN107515987A (en) * | 2017-08-25 | 2017-12-26 | 中国地质大学(北京) | The simulation accelerated method of Groundwater Flow based on more relaxation Lattice Boltzmann models |
CN108427605A (en) * | 2018-02-09 | 2018-08-21 | 中国地质大学(北京) | The accelerated method of streamline simulation is realized based on particles trace algorithm |
Non-Patent Citations (3)
Title |
---|
李丹丹: "地下水流动空间数据并行计算的研究", 《中国博士学位论文全文数据库 基础科学辑》 * |
李安平: "基于CUDA的并行图像处理问题研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
贾永红: "《数字图像处理实习教程》", 31 January 2007 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110796701B (en) * | 2019-10-21 | 2022-06-07 | 深圳市瑞立视多媒体科技有限公司 | Identification method, device and equipment of mark points and storage medium |
CN110796701A (en) * | 2019-10-21 | 2020-02-14 | 深圳市瑞立视多媒体科技有限公司 | Identification method, device and equipment of mark points and storage medium |
CN111186139A (en) * | 2019-12-25 | 2020-05-22 | 西北工业大学 | Multi-level parallel slicing method for 3D printing model |
CN111186139B (en) * | 2019-12-25 | 2022-03-15 | 西北工业大学 | Multi-level parallel slicing method for 3D printing model |
CN111552478A (en) * | 2020-04-30 | 2020-08-18 | 上海商汤智能科技有限公司 | Apparatus, method and storage medium for generating CUDA program |
CN111552478B (en) * | 2020-04-30 | 2024-03-22 | 上海商汤智能科技有限公司 | Apparatus, method and storage medium for generating CUDA program |
CN112148437A (en) * | 2020-10-21 | 2020-12-29 | 深圳致星科技有限公司 | Calculation task acceleration processing method, device and equipment for federal learning |
CN112257313A (en) * | 2020-10-21 | 2021-01-22 | 西安理工大学 | Pollutant transport high-resolution numerical simulation method based on GPU acceleration |
CN112257313B (en) * | 2020-10-21 | 2024-05-14 | 西安理工大学 | GPU acceleration-based high-resolution numerical simulation method for pollutant transportation |
CN112148437B (en) * | 2020-10-21 | 2022-04-01 | 深圳致星科技有限公司 | Calculation task acceleration processing method, device and equipment for federal learning |
CN114490011A (en) * | 2020-11-12 | 2022-05-13 | 上海交通大学 | Parallel acceleration implementation method of N-body simulation in heterogeneous architecture |
CN112380793A (en) * | 2020-11-18 | 2021-02-19 | 上海交通大学 | Turbulence combustion numerical simulation parallel acceleration implementation method based on GPU |
CN112380793B (en) * | 2020-11-18 | 2024-02-13 | 上海交通大学 | GPU-based turbulence combustion numerical simulation parallel acceleration implementation method |
CN112947870B (en) * | 2021-01-21 | 2022-12-30 | 西北工业大学 | G-code parallel generation method of 3D printing model |
CN112947870A (en) * | 2021-01-21 | 2021-06-11 | 西北工业大学 | G-code parallel generation method of 3D printing model |
CN113660046B (en) * | 2021-08-17 | 2022-11-11 | 东南大学 | Method for accelerating generation of large-scale wireless channel coefficients |
CN113660046A (en) * | 2021-08-17 | 2021-11-16 | 东南大学 | Method for accelerating generation of large-scale wireless channel coefficients |
CN114970395A (en) * | 2022-06-10 | 2022-08-30 | 青岛大学 | Large-scale fluid simulation method and system based on two-dimensional Saint-Vietnam equation |
CN117687779A (en) * | 2023-11-30 | 2024-03-12 | 山东诚泉信息科技有限责任公司 | Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform |
CN117687779B (en) * | 2023-11-30 | 2024-04-26 | 山东诚泉信息科技有限责任公司 | Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform |
CN118502964A (en) * | 2024-07-12 | 2024-08-16 | 安徽大学 | Tokamak new classical circumferential viscous torque CUDA simulation implementation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109857543A (en) | A kind of streamline simulation accelerated method calculated based on the more GPU of multinode | |
CN103970960B (en) | The element-free Galerkin structural topological optimization method accelerated parallel based on GPU | |
Brodtkorb et al. | Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation | |
CN103765376B (en) | Graphic process unit with clog-free parallel architecture | |
CN101727653B (en) | Graphics processing unit based discrete simulation computation method of multicomponent system | |
CN103440163B (en) | Use the accelerator emulation mode based on PIC model of GPU Parallel Implementation | |
CN106021828A (en) | Fluid simulation method based on grid-boltzmann model | |
US11145099B2 (en) | Computerized rendering of objects having anisotropic elastoplasticity for codimensional frictional contact | |
CN106547627A (en) | The method and system that a kind of Spark MLlib data processings accelerate | |
CN103345580B (en) | Based on the parallel CFD method of lattice Boltzmann method | |
CN109146067A (en) | A kind of Policy convolutional neural networks accelerator based on FPGA | |
CN104360896A (en) | Parallel fluid simulation acceleration method based on GPU (Graphics Processing Unit) cluster | |
KR100914869B1 (en) | System and Method for Real-Time Cloth Simulation | |
CN104392147A (en) | Region scale soil erosion modeling-oriented terrain factor parallel computing method | |
CN111445003A (en) | Neural network generator | |
US20190318533A1 (en) | Realism of scenes involving water surfaces during rendering | |
CN107016180A (en) | A kind of particle flow emulation mode | |
CN107025332A (en) | A kind of microcosmic water diffusion process method for visualizing of fabric face based on SPH | |
JPH11502958A (en) | Collision calculation for physical process simulation | |
Wang et al. | FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications | |
CN112100939B (en) | Real-time fluid simulation method and system based on computer loader | |
CN108427605B (en) | Acceleration method for realizing streamline simulation based on particle tracking algorithm | |
CN106373192B (en) | A kind of non-topological coherence three-dimensional grid block tracing algorithm | |
CN109949398A (en) | Particle renders method, apparatus and electronic equipment | |
Amador et al. | CUDA-based linear solvers for stable fluids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190607 |