CN112099936A

CN112099936A - Heterogeneous parallel computing implementation method and device for three-dimensional acoustic wave NPML algorithm

Info

Publication number: CN112099936A
Application number: CN201910519994.0A
Authority: CN
Inventors: 黄兴贵; 皮红梅; 隆波; 高畅; 杨天福; 李平
Original assignee: China National Petroleum Corp; BGP Inc
Current assignee: China National Petroleum Corp; BGP Inc
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2020-12-18
Anticipated expiration: 2039-06-17
Also published as: CN112099936B

Abstract

The invention provides a heterogeneous parallel computing implementation method and device of a three-dimensional acoustic wave NPML algorithm, wherein the method comprises the following steps: determining available computing devices in a known heterogeneous computing platform, and constructing a computing resource topology structure according to the available computing devices, wherein the available computing devices in the heterogeneous computing platform comprise CPUs, GPUs and accelerators; distributing corresponding calculation tasks for each available calculation device according to the calculation scale of a single shot in the three-dimensional acoustic wave NPML algorithm and the memory of each available calculation device in the calculation resource topological structure; and distributing corresponding computing tasks according to the computing resource topological structure and each available computing device, realizing parallel computing of the three-dimensional acoustic wave NPML algorithm, and obtaining seismic simulation record data of a single shot. The scheme shortens the total simulation calculation time of the single cannon, enhances the application timeliness, improves the unit time efficiency and exerts the maximum calculation capacity of the heterogeneous calculation platform.

Description

Heterogeneous parallel computing implementation method and device for three-dimensional acoustic wave NPML algorithm

Technical Field

The invention relates to the technical field of oil exploration, in particular to a heterogeneous parallel computing implementation method and device of a three-dimensional sound wave NPML algorithm.

Background

The three-dimensional acoustic wave NPML algorithm is an important tool for analyzing an observation system in the design of a seismic exploration acquisition scheme, and due to the fact that the calculation workload and the component data volume are overlarge, in practical application, the calculation time is too long, the intermediate data are too much, the requirement on calculation resources is high, and the practical application is seriously influenced.

The existing three-dimensional sound wave NPML algorithm is realized by adopting a homogeneous computing method, namely a multi-computer CPU or a multi-computer GP (Graphics Processing Unit, a processor of a display card is called as a graphic processor), although the multi-computer computing capability can be exerted, the computing time of a single computer is very low, the power consumption efficiency is worse, and the simulation of a single gun has the problems that: 1) GPU implementation, the single shot simulation time is short but a larger production model cannot be simulated; 2) the CPU realizes that the single shot simulation time is long and cannot bear the actual production application. The calculation capability of the observation system and the calculation capability of the observation system cannot be fused, the calculation capability is dispersed, and the production service cannot be well served, so that a huge bottleneck problem is brought to the analysis work of the optimization design of the observation system, and the work benefit is seriously influenced.

Disclosure of Invention

The embodiment of the invention provides a method and a device for realizing heterogeneous parallel computation of a three-dimensional acoustic wave NPML algorithm, which solve the technical problems of low computation timeliness and poor power consumption efficiency caused by isomorphic computation in the prior art.

The embodiment of the invention provides a heterogeneous parallel computing implementation method of a three-dimensional acoustic wave NPML algorithm, which comprises the following steps:

determining available computing devices in a known heterogeneous computing platform, and constructing a computing resource topology structure according to the available computing devices, wherein the available computing devices in the heterogeneous computing platform comprise CPUs, GPUs and accelerators;

distributing corresponding calculation tasks for each available calculation device according to the calculation scale of a single shot in the three-dimensional acoustic wave NPML algorithm and the memory of each available calculation device in the calculation resource topological structure;

and distributing corresponding computing tasks according to the computing resource topological structure and each available computing device, realizing parallel computing of the three-dimensional acoustic wave NPML algorithm, and obtaining seismic simulation record data of a single shot.

The embodiment of the invention also provides a device for realizing heterogeneous parallel computation of the three-dimensional acoustic wave NPML algorithm, which comprises the following steps:

an available computing device determination module, configured to determine available computing devices in a known heterogeneous computing platform, from which a computing resource topology is constructed, the available computing devices in the heterogeneous computing platform including CPUs, GPUs, and accelerators;

the calculation task allocation module is used for allocating a corresponding calculation task to each available calculation device according to the calculation scale of a single shot in the three-dimensional acoustic wave NPML algorithm and the memory of each available calculation device in the calculation resource topological structure;

and the algorithm parallel computing module is used for distributing corresponding computing tasks according to the computing resource topological structure and each available computing device, realizing the parallel computing of the three-dimensional acoustic wave NPML algorithm and obtaining the seismic simulation record data of the single cannon.

The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method when executing the computer program.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the method.

In one embodiment, available computing devices in a known heterogeneous computing platform from which to build a computing resource topology are determined, the available computing devices in the heterogeneous computing platform including CPUs, GPUs, and accelerators; distributing corresponding calculation tasks for each available calculation device according to the calculation scale of a single shot in the three-dimensional acoustic wave NPML algorithm and the memory of each available calculation device in the calculation resource topological structure; and distributing corresponding computing tasks according to the computing resource topological structure and each available computing device, realizing parallel computing of the three-dimensional acoustic wave NPML algorithm, and obtaining seismic simulation record data of a single shot. The invention fully satisfies the computing capability and the memory capability of the CPUs and the GPUs, satisfies the development of modern and future heterogeneous computing platform resources, realizes the highly parallel computing capability among multiple computing devices, furthest exerts the comprehensive computing efficiency of the heterogeneous computing platform, further improves the production efficiency and shortens the total time of the simulation computation of a single shot.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a heterogeneous parallel computing implementation method of a three-dimensional acoustic NPML algorithm according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a customized resource topology for a heterogeneous computing platform of a super workstation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a commercial super workstation heterogeneous computing platform resource topology according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another commercial super workstation heterogeneous computing platform resource topology provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a calculation team for a single shot calculation task provided by an embodiment of the invention;

FIG. 6 is a schematic diagram of a single shot calculation team consisting of 4 computing devices according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a single shot calculation team consisting of 3 computing devices according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a task segmentation scheme for 5 heterogeneous computing devices according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a task segmentation scheme for 3 heterogeneous computing devices according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a prior art computational data model of a three-dimensional acoustic wave NPML algorithm;

FIG. 11 is a schematic diagram of a first-order partial derivative data structure of a three-dimensional acoustic wave NPML algorithm provided by an embodiment of the present invention;

FIG. 12 is a diagram illustrating a data structure of a first-order partial derivative iy slice according to an embodiment of the present invention;

fig. 13 is a structural block diagram of a heterogeneous parallel computing implementation device of a three-dimensional acoustic NPML algorithm according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the embodiment of the present invention, a method for implementing heterogeneous parallel computation of a three-dimensional acoustic NPML algorithm is provided, as shown in fig. 1, the method includes:

step 101: determining available computing devices in a known heterogeneous computing platform, and constructing a computing resource topology structure according to the available computing devices, wherein the available computing devices in the heterogeneous computing platform comprise CPUs, GPUs and accelerators;

step 102: distributing corresponding calculation tasks for each available calculation device according to the calculation scale of a single shot in the three-dimensional acoustic wave NPML algorithm and the memory of each available calculation device in the calculation resource topological structure;

step 103: and distributing corresponding computing tasks according to the computing resource topological structure and each available computing device, realizing parallel computing of the three-dimensional acoustic wave NPML algorithm, and obtaining seismic simulation record data of a single shot.

In the embodiment of the present invention, step 101 is a material basis of the whole computation, and is a precondition for heterogeneous parallel computation. To exploit the full computing power of heterogeneous computing platforms, it is necessary to first ascertain all the computing resources of a particular heterogeneous computing platform, which is both a first and a second necessity, otherwise we don't speak of heterogeneous parallel computing.

The computing power and the memory logical relationship of the computing equipment are defined, namely the capacity of bearing the task is provided, and a basis is provided for distributing the task.

Example 1, the configuration parameters for a custom-made super workstation of a certain brand are:

CPUs: two-way Intel Xeon E5-2699v4(22 core/44 thread)

A main memory: 256GB

A display card: Quadro-M6000/12GB

The accelerator card 1: Tesla-K80/24GB

And 2, the accelerator card: radeon Instingt-MI 25/16GB

Double network cards: 2 × Gbe

Hard disk: 4 x 2TB

From the configuration parameters described above, a computing resource topology as shown in FIG. 2 may be created.

Example 2, the configuration parameters of the first validation superbase of the present invention are:

CPUs: two-way Intel Xeon E5-2697v4(18 cores/36 threads)

A main memory: 128GB

A display card: Quadro-M5000/8GB

The accelerator card 1: Tesla-K40c/12GB

Double network cards: 2 × Gbe

Hard disk: 2 x 2TB

In heterogeneous computing resource detection, the topology of available computing resources is shown in fig. 3.

Example 3, the configuration parameters of the second validation superbase of the present invention are:

CPUs: two-way Intel Xeon E5-2697v2(12 cores/24 threads)

A main memory: 128GB

A display card: Quadro-K5000/4GB

The accelerator card 1: Tesla-K40c/12GB

Double network cards: 2 × Gbe

Hard disk: 2 x 2TB

In heterogeneous computing resource detection, the topology of available computing resources is shown in fig. 4.

Specifically, the main function of this step is to create a computing team of available computing devices with the topology of the heterogeneous computing platform, and to provide basic conditions for subsequent computing task allocation. The invention relates to a method for realizing heterogeneous parallel computing based on a three-dimensional acoustic wave NPML algorithm, so that when a computing device is detected, whether the computing device is available or not must be judged by taking the requirement of the algorithm as a prerequisite.

In the invention, the three-dimensional sound wave NPML algorithm has more intermediate components, and the efficiency is considered, so that the computing equipment with the computing capacity less than 5 percent of the computing scale of a single gun is excluded to improve the working efficiency of a computing team.

Firstly, loading basic parameters of a speed model, an observation system and the like of a calculation task, determining the maximum calculation scale of a single shot and providing a selection basis for the creation of a calculation team. As shown in fig. 5.

Scale was calculated for one particular single shot: 879 (rows) x 861 (columns) x 580 (deep), creates a different computing team with the two heterogeneous computing platforms for validation of the present invention.

Such as: the heterogeneous computing platform of example 2, available as a computing team as shown in FIG. 6, consists of 4 computing devices. Example 3 a heterogeneous computing platform, a computing team, as shown in fig. 7, is available, consisting of 3 computing devices.

In the embodiment of the invention, step 102 is to make a single-shot calculation scale division scheme, to meet the requirement of an algorithm on both the calculation amount and the memory amount during the wave field calculation, and to balance the efficiency of heterogeneous parallel calculation.

Step 102 is a key feature of heterogeneous parallel computing, a three-dimensional acoustic wave NPML algorithm is a time-consuming and memory-consuming wave equation, available computing equipment of a heterogeneous computing platform is required to complete earthquake simulation work of each cannon together, and due to the fact that the capabilities of the computing equipment are different, computing tasks need to be accurately distributed, a computing team of a single cannon can work efficiently, and the simulation computing work of each cannon can be completed in the shortest time.

The task allocation rule of this step is: taking the memory space of the computing equipment as a weight, the larger the weight is, the more calculation tasks are undertaken, such as: 1) the acceleration equipment such as GPUs and the like has high calculation performance, but the memory space is smaller than that of the CPUs, so that the acceleration equipment is suitable for bearing tasks which are intensive in calculation and small in storage space; 2) the cpu devices have lower computation performance than the GPUs, but the memory (i.e., main memory) space is very large, giving a task with a slightly smaller computation amount and a larger storage space. The distribution can exert the advantages of different devices, improve the load balance of the three-dimensional sound wave NPML algorithm and fully exert the parallel performance of the heterogeneous computing platform.

For the computing resource topology structure of the heterogeneous computing platform in example 1, a schematic diagram of the computing tasks partitioned and allocated by the model is shown in fig. 8. For the computing resource topology of the verification computing platform of example 2, a schematic diagram of the computing tasks to which the same model is divided and distributed is shown in fig. 9. Wherein, map represents that the device memory is mapped to the host memory, host represents the host memory, and inside represents the memory on the device.

Specifically, step 103 is an ultimate goal of heterogeneous parallel computation of the three-dimensional acoustic NPML algorithm, and the seismic simulation record data of a single shot is computed, wherein a difference equation of the three-dimensional acoustic NPML algorithm is as follows:

where c represents speed, unit: m/s; p (x, y, z) represents stress, in units: pascal, 1 pascal ═ 1 newton/m²(ii) a V represents displacement, unit: rice; t represents the time of day and t represents the time of day,unit: second; d represents the grid cell size, in units: rice, d_x、d_y、d_zRespectively representing the unit size of x, y and z coordinate axis directions;

a sign of the derivation is represented,

the order of the 2-order derivative is shown,

representing the 2 nd order partial derivative of stress with respect to time,

representing the 2 nd order partial derivative of stress in the x direction,

representing the partial derivative of stress of order 2 in the y-direction,

representing the partial derivative of stress of order 2 in the z direction,

representing the 1 st order partial derivative of the displacement in the x direction,

representing the 1 st order partial derivative of the displacement in the y direction,

the 1 st order partial derivative of the displacement in the z direction is shown.

The data model used for this equation is shown in FIG. 10, which has a workflow: inputting a speed model; solving a first-order partial derivative and a second-order partial derivative data model; resynthesizing a wave field data model; seismic simulation records are then collected as output data for the calculations.

The present invention innovatively decomposes the above workflow in this step 103, divides solving the first order partial derivative and solving the second order partial derivative into two independent stages, and fuses wave field synthesis. Therefore, the calculation flow becomes: 1) solving the second-order partial derivative and synthesizing a wave field; 2) solving the first-order partial derivative and synthesizing a wave field; 3) and collecting seismic simulation records. Therefore, the demand of memory space in derivation can be reduced, and the bandwidth access efficiency is improved, thereby improving the calculation efficiency of the implementation method.

The first-order partial derivative of the three-dimensional acoustic wave NPML algorithm is a 0-value characteristic in a core area, such as a data model shown in fig. 11, so that when the first-order partial derivative is solved, a memory space required by the first-order partial derivative in the core area can be saved, and the memory space is used for storing non-0 data such as second-order data, wave fields and the like, so that the utilization rate of equipment is improved.

The design of heterogeneous parallel computing described above is applied to the task allocation strategy in step 102. Such as: the region where the calculation task is born by a certain device contains more core regions, the memory space left out by the first-order partial derivative is more, and more second-order and wave field data can be stored, namely more calculation tasks can be born.

The specific task allocation calculation method is as follows:

first, a task allocation rule is defined. Three dimensions are extensions of two dimensions, allowing for compatibility with two dimensions; the task division is performed by taking a slice as a unit, forming the slice by an X axis-a Z axis, and cutting the unit by a Y axis.

The calculation scale of each shot is the number of grid points as a counting unit, so that the size of one slice is the number of grids in the X direction multiplied by the number of grids in the Z direction, and the number of grids in the Y direction is the number of slices which can be cut.

Such as: the calculated scale per shot was: 879 (rows) × 861 (columns) × 580 (deep), then nx ═ 879(grids), ny ═ 861(grids), nz ═ 580 (grids).

Section size: nslice ═ nx × nz ═ 879 × 580 ═ 509820 (grids);

the number of slices that can be cut: ntasks ny 861 (slices).

According to the definition of the three-dimensional acoustic wave NPML algorithm, the size of each slice refers to the size of a velocity model, and the slice size of a wave field is larger than that of the velocity model slice, the calculation formula is as follows:

wave field slice size: wfield _ nslice ═ (nx +2 × nOrder) x (nz +2 × nOrder), where nOrder is the spatial difference order: 1. 2, 3, 4 and 5. If nOrder is 5, then

wfield_nslices＝(879+10)×(580+10)＝524510(grids)。

According to the definition of the three-dimensional acoustic wave NPML algorithm, the buffer zones required by wave field calculation are as follows: and the composite wave field, a second-order X partial derivative, a second-order Y partial derivative, a second-order Z partial derivative, a first-order X partial derivative, a first-order Y partial derivative, a first-order Z partial derivative and the like are arranged in7 buffer areas. Since wavefields are propagated in time, the algorithm extends on wavefields at 2 instants. Thus requiring a total of 14 wavefield buffers.

In step 103, a wave field parallel computing process is described, in which a buffer of the first-order partial derivatives may use a simplified or compressed mode to reduce its memory space by about 65%, and the data structure of the first-order partial derivatives is shown in fig. 11, and from the structure, there are differences in slice sizes at different positions, such as: iy takes the value in the interval [0, nEdges) or [ ny-nEdges, ny), the slice size of which is equal to the slice size of the second partial derivative; if iy takes the value of the [ nEdges, ny-nEdges) interval, the structure is shown in FIG. 12, then the slice size is:

wherein nEdges represents the number of edge-expanding grid points of the velocity model. The default value of the algorithm is 40 (grids).

From the above calculation of the slice size, the memory space required for the slice with the first-order partial derivative in the interval [ nEdges, ny-nEdges) is very small, and only 27% of the second-order partial derivative is needed, that is, the first-order partial derivative in the region has specificity, that is, all 0 values at the core part, so the memory space of the first-order partial derivative at these positions can be saved. Based on the characteristic, the calculation tasks of the core part with the all 0 values of the first-order partial derivatives of the wave field calculation at different parts are distributed to the calculation equipment with smaller memories, such as GPUs and the like, so that the calculation equipment can bear the tasks which are saturated enough to achieve the optimal calculation efficiency. The remaining portions or regions of the first partial derivative other than 0 are assigned to the CPUs as a whole. Wherein, different parts refer to: the wavefield data volume is divided into a plurality of patches, and each patch has a location relative to the entire wavefield data volume, which is described as a location by a local orientation. Such as: front, back, left, right, etc.

According to the allocation strategy, when a queue of a calculation team is created, all acceleration devices such as GPUs are placed in the middle of the queue, and the head and the tail of the queue are occupied by CPU calculation devices. Such as: a 3 device queue or a 4 device queue.

Again, the workload of the task is calculated. According to the size of the memory space available for calculation provided by the corresponding equipment. The method is firstly distributed to accelerating equipment such as GPUs and the like, and the rest is distributed to CPU equipment.

For example, in a 4-device computing team, the GPU0 is a video card, and the memory of the video card needs to be reserved for display, otherwise the display function of the screen is frozen, and the experience requirements of the user are considered, so that the GPU only uses 70-80% of the memory of the video card for computing when encountering the GPU. The video card has 8GB video memory, and 80% of the video memory can be used for calculation according to the special effect of the video card, so that the calculation buffer is approximately equal to 6.4 GB.

According to the calculation rule of the buffer, the buffer required by each calculation slice is 1 model slice +1 wave field slice +3 second order partial derivative slice +3 first order partial derivative slice.

GPU0 may store the number of slices of the buffer:

storage_buffers_nslices＝6.4GB/24MB≈273(slices)；

the computing tasks that can be undertaken are:

task_ny_grids＝storage_buffers_nslices-2×nOrder≈260(grids)。

similarly, GPU1 is a pure accelerator card, the total memory of which is 12GB, and the dedicated memory of which is up to 11439MB, and the number of stored buffer slices can be obtained:

storage_buffers_nslices＝11439MB/24MB≈476(slices)；

the computing tasks that can be undertaken are:

task_ny_grids＝storage_buffers_nslices-2×nOrder≈460(grids)。

then, the remaining tasks of the CPUs, domain _ task ═ 861-:

cpu0_task_ny_grids＝70(grids)；

cpu1_task_ny_grids＝71(grids)。

and finally, determining the feasibility of task allocation, namely that the calculation tasks respectively borne by the accelerating equipment and the CPUs conflict with the calculation strategy and the algorithm. According to the definition of the algorithm, the core region of the forward modeling is the kernel region at the center of the calculation scale of each shot, namely nEdges grids within each axial direction:

inner _ nx ═ nx-nonedges [40,839) or [40,838 ];

inner _ ny ═ nEdges, ny-nEdges ═ 40,821) or [40,820 ];

inner _ nz ═ nEdges, nz-nEdges ═ 40,540) or [40,539 ];

thus, according to this definition, in order to ensure that the accelerator device only calculates the inner region, the task allocation scheme described above is complied with, and the feasibility of the allocation scheme is determined here by the tasks assigned by the cpu devices.

If the computational tasks of the CPU0 or the CPU1 are less than the number of edge-extended grid points (nEdges) of the velocity model, it is determined that it is infeasible for each available computing device to allocate the corresponding computational tasks, and the allocation scheme needs to be adjusted, otherwise it is feasible. Pseudo code:

If cpu0_task_ny_grids<nEdges or cpu1_task_ny_grids<nEdges Then

cpu0_task_ny_grids＝nEdges；

cpu1_task_ny_grids＝nEdges；

// number of slices in reassigned Y-direction: ny-2 XnEdges

End

According to the distribution strategy of the step, the advantages of different devices can be exerted, the load balance of the three-dimensional sound wave NPML algorithm is improved, and the parallel performance of the heterogeneous computing platform is fully exerted.

In the embodiment of the invention, the parallelization refers to organizing different devices (such as CPUs and GPUs) and simultaneously participating in the simulation calculation task of calculating a single cannon, and the CPUs and the GPUs have different architectures and instruction systems, so that the CPUs and the GPUs are called heterogeneous parallel calculation when simultaneously participating in the calculation.

Step 103 of the invention is the ultimate goal of heterogeneous parallel computation of the three-dimensional acoustic wave NPML algorithm, and the work of the step also needs the implementation of the basic working steps, which are interdependent and interwoven together to exert the computation efficiency of the heterogeneous computation platform on the three-dimensional acoustic wave NPML algorithm.

The heterogeneous parallel computing structure is a two-stage parallel structure, and the first stage is fine-grained parallel computing inside equipment; the second stage is inter-device block parallel computation.

First, according to the task assigned in step 102, each computing device is assigned a sub-block: starting position, task amount; and constructing a fine-grained parallel computing scheme.

Secondly, the computing equipment is ordered according to the starting position, and the adjacent relation of the secondary parallel subblocks is established. After each iteration is completed, adjacent sub-blocks need to exchange part of the data to synchronize the data relationship of the entire model.

Again, a time iteration loop is started. Within the loop, each computing device is performed the following:

issuing a wave field calculation command at the current moment;

issuing a photographing command;

issuing a recording sampling command;

issuing a sub-block data exchange command;

issuing a wave field buffer rotation command;

and finally, collecting local recording data acquired by each computing device and synthesizing the local recording data into a complete seismic record.

And finishing the heterogeneous parallel computing task of the single cannon.

Based on the same inventive concept, the embodiment of the present invention further provides a heterogeneous parallel computation implementation apparatus for a three-dimensional acoustic wave NPML algorithm, as described in the following embodiments. The problem solving principle of the heterogeneous parallel computing implementation device of the three-dimensional acoustic wave NPML algorithm is similar to that of the heterogeneous parallel computing implementation method of the three-dimensional acoustic wave NPML algorithm, so that the implementation of the heterogeneous parallel computing implementation device of the three-dimensional acoustic wave NPML algorithm can be referred to that of the heterogeneous parallel computing implementation method of the three-dimensional acoustic wave NPML algorithm, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 13 is a block diagram of a heterogeneous parallel computing implementation apparatus of a three-dimensional acoustic NPML algorithm according to an embodiment of the present invention, as shown in fig. 13, including:

an available computing device determining module 1301, configured to determine available computing devices in a known heterogeneous computing platform, from which a computing resource topology is constructed, the available computing devices in the heterogeneous computing platform including CPUs, GPUs, and accelerators;

the calculation task allocation module 1302 is configured to allocate a corresponding calculation task to each available calculation device according to the calculation scale of a single shot in the three-dimensional acoustic NPML algorithm and the memory of each available calculation device in the calculation resource topology;

and the algorithm parallel computing module 1303 is used for distributing corresponding computing tasks according to the computing resource topological structure and each available computing device, realizing parallel computing of the three-dimensional acoustic wave NPML algorithm and obtaining seismic simulation record data of a single shot.

In this embodiment of the present invention, the available computing device determining module 1301 is specifically configured to:

determining computing capabilities of computing devices in known heterogeneous computing platforms;

the computing device's computing power is compared to 5% of the computing size of a single shot, and computing devices having computing powers greater than 5% of the computing size of a single shot are determined to be available computing devices.

In this embodiment of the present invention, the calculation task allocation module 1302 is specifically configured to:

distributing the computing tasks of all 0-value core positions of the first-order partial derivatives of the wave field computation of different positions to available computing equipment GPUs or accelerators;

the remaining portions or non-0 regions of the first partial derivative are assigned to the available computing devices CPUs.

In this embodiment of the present invention, the calculation task allocation module 1302 is further configured to:

determining a feasibility of each available computing device to assign the corresponding computing task.

determining the feasibility of each available computing device to assign a corresponding computing task, comprising:

if the calculation tasks divided by the available calculation equipment CPUs are smaller than the number of the edge-expanding grid points of the speed model, determining that each available calculation equipment allocates the infeasible calculation task corresponding to the available calculation equipment; and if the calculation tasks divided by the available calculation equipment CPUs are larger than the number of the edge-expanding grid points of the speed model, determining that each available calculation equipment distributes the corresponding calculation task to be feasible.

In this embodiment of the present invention, the algorithm parallel computing module 1303 is specifically configured to:

dividing a first-order partial derivative solution and a second-order partial derivative solution in a three-dimensional acoustic wave NPML algorithm into two independent stages, and fusing wave field synthesis;

according to the computing resource topological structure, distributing corresponding computing tasks to each available computing device, and sequentially executing the following processes:

solving the second-order partial derivative and synthesizing a wave field;

solving the first-order partial derivative and synthesizing a wave field;

seismic simulation record data of a single shot are obtained.

In conclusion, by using the implementation method of the invention, the calculation efficiency in unit time is well improved, the efficiency of a single machine is improved, the construction period of analysis design is shortened, and the time cost is saved.

Compared with the current implementation method, under the condition that both sides can simulate: compared with the GPU implementation, the single-shot simulation aging is improved by 10-50%; compared with the CPU implementation, the single shot simulation aging is improved by more than 200%. Compared with an actual production application model, the current GPU cannot be simulated, and is free of time efficiency comparison; compared with the current CPU implementation, the aging is improved by more than 300%.

The following are examples showing the effects of the invention.

Example 1

1) A computing platform: HP-Z820 workstation

CPUs: two ways, each way Intel Xeon E5-2697v2(12 core 24 threads) @2.70GHz

A display card: nVidia, Quadro-K5000/4GB

An accelerator card: NVIDIA, Tesla-K40c/12GB

A main memory: 128GB

Network card: 2 Intel 82574L Gigabit Network

Hard disk: 2TB/10000rps

Operating the system: win7-64 bit

2) Three-dimensional acoustic wave simulation parameters

Calculating a model:

a geological model: nx is 2400, ny is 2000, nz is 500

Grid cell: dx is 5.0m, dy is 5.0m, dz is 5.0m

Calculating the scale by using a single gun: 879 (row) × 861 (column) × 580 (deep)

Maximum speed of model: 4000m/s

Simulation parameters:

wavelet dominant frequency: 25Hz, Rake wavelet

Spatial difference: 10 th order

Recording length: 2 seconds

Sampling interval: 0.5 milliseconds

Arrangement scale: 40 lines x400 tracks, total number of tracks 16000 tracks

3) Calculation results

The memory required for calculating the single cannon: about 31GB

Recording by a single shot: about 265MB

The invention is single time consumption: about 2 hours and 46 minutes.

The existing CPU version of single gun consumes time: about 8 hours 37 minutes.

The existing GPU version of single gun consumes time: and the calculation cannot be carried out due to the fact that Tesla is not placed at a large scale.

The efficiency is improved by about 3 times, and 8.61/2.77 is approximately equal to 3.1.

Example 2

1) Hardware configuration for heterogeneous computing platforms

See step 1 of resource detection in the summary of the invention, example 3 configures parameters.

2) Three-dimensional sound wave simulation parameter model

Calculation model of single shot arrangement: nxGrids 880, nyGrids 860, nzGrids 580;

grid cell size: dx is 5m, dy is 5m, dz is 5 m;

speed model parameters: the minimum speed is 2000m/s, and the maximum speed is 4000 m/s;

forward modeling parameters:

the wavelet type is Rake wavelet, and the wavelet dominant frequency is 25 hz;

recording length is 2s, and sampling interval is 0.5 ms;

an observation system: 40 lines of 400 receiving points each, for a total of 16000 tracks.

3) Calculating the age

The invention 3 cannons consume time on average: 2 hours and 46 minutes;

the average consumed time of the existing CPU version 3 cannons is as follows: 8 hours and 37 minutes;

the average time consumption of 3 guns of the existing GPU version is as follows: infinity (because GPU memory cannot store such a large model);

the aging ratio is as follows: 8.61/2.76 ≈ 3.1 (times); 5.85/8.61 ≈ 68%.

In summary, the heterogeneous parallel computation implementation method and device of the three-dimensional acoustic wave NPML algorithm provided by the invention have the following advantages:

1) aiming at the characteristics of the three-dimensional sound wave NPML algorithm, the method is decomposed into a first-order partial derivative and a second-order partial derivative two-step calculation method to realize analog calculation, so that the demand on memory space during derivation is reduced, the access bandwidth efficiency is improved, and the calculation efficiency of the realization method is improved.

2) The method is characterized in that corresponding special processing is carried out aiming at the characteristics of different computing devices, the computing capacity and the memory capacity of CPUs and GPUs are fully met, the existence condition of multiple GPUs is considered, respective advantages are exerted, the development of modern and future heterogeneous computing platform resources is met, the highly parallel computing capacity among multiple computing devices is realized, the comprehensive computing efficiency of the heterogeneous computing platform is exerted to the maximum extent, and further the production efficiency is improved.

3) The total simulation calculation time of the single cannon is shortened, the application timeliness is enhanced, the unit time efficiency is improved, and the maximum calculation capacity of the heterogeneous calculation platform is exerted.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A heterogeneous parallel computing implementation method of a three-dimensional acoustic wave NPML algorithm is characterized by comprising the following steps:

2. The heterogeneous parallel computing implementation method of the three-dimensional acoustic wave NPML algorithm according to claim 1, wherein available computing devices in a known heterogeneous computing platform are determined as follows:

3. The method for realizing heterogeneous parallel computation of the three-dimensional acoustic NPML algorithm according to claim 1, wherein the step of allocating a corresponding computation task to each available computing device according to the computation scale of a single shot in the three-dimensional acoustic NPML algorithm and the memory of each available computing device in the computation resource topology comprises the following steps:

4. The method for realizing heterogeneous parallel computation of the three-dimensional acoustic wave NPML algorithm as recited in claim 1, further comprising:

5. The method for implementing heterogeneous parallel computation of the three-dimensional acoustic wave NPML algorithm according to claim 4, wherein determining the feasibility of each available computing device for distributing the corresponding computing task comprises:

6. The method for realizing heterogeneous parallel computation of the three-dimensional acoustic wave NPML algorithm according to claim 1, wherein the parallel computation of the three-dimensional acoustic wave NPML algorithm is realized by allocating corresponding computation tasks to each available computing device according to the computation resource topology structure, and seismic simulation record data of a single shot is obtained, and the method comprises the following steps:

solving the second-order partial derivative and synthesizing a wave field;

solving the first-order partial derivative and synthesizing a wave field;

seismic simulation record data of a single shot are obtained.

7. A heterogeneous parallel computing implementation device of a three-dimensional acoustic wave NPML algorithm is characterized by comprising the following steps:

8. The apparatus for implementing heterogeneous parallel computation of a three-dimensional acoustic NPML algorithm of claim 7, wherein the available computing device determining module is specifically configured to:

9. The heterogeneous parallel computation implementation device of the three-dimensional acoustic wave NPML algorithm of claim 7, wherein the computation task allocation module is specifically configured to:

10. The heterogeneous parallel computation implementation device of the three-dimensional acoustic wave NPML algorithm of claim 7, wherein the computation task allocation module is further configured to:

11. The heterogeneous parallel computation implementation device of the three-dimensional acoustic wave NPML algorithm of claim 7, wherein the computation task allocation module is specifically configured to:

12. The heterogeneous parallel computation implementation device of the three-dimensional acoustic wave NPML algorithm of claim 7, wherein the algorithm parallel computation module is specifically configured to:

solving the second-order partial derivative and synthesizing a wave field;

solving the first-order partial derivative and synthesizing a wave field;

seismic simulation record data of a single shot are obtained.

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 6.