CN113408109B

CN113408109B - Hybrid parallel method for multipoint geostatistical random simulation process

Info

Publication number: CN113408109B
Application number: CN202110602686.1A
Authority: CN
Inventors: 陈麒玉; 崔哲思; 刘刚; 何珍文; 张军强; 张志庭
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-05-27
Anticipated expiration: 2041-05-31
Also published as: CN113408109A

Abstract

The invention provides a hybrid parallel method for a multipoint geostatistics random simulation process, which divides a computing node into a master node and a slave processor according to a coarse-grained parallel strategy, wherein the master node divides a simulation task into a plurality of sub-simulation tasks and obtains simulation information corresponding to simulation sub-simulation tasks, the master node finds the slave processor with the minimum computing load and sends the simulation information and the corresponding sub-simulation tasks to the slave processor for simulation work, when the slave processor finishes the simulation work of the sub-simulation tasks, the sub-simulation results are returned to the master node, and when all the sub-simulation tasks are finished, the sub-simulation results are summarized to obtain the simulation results. According to the embedded mixed parallel strategy, the original serial computation is changed into parallel computation by using a coarse-grained parallel strategy and a fine-grained parallel strategy, so that the modeling and simulation efficiency of the large-scale high-resolution geological model is improved.

Description

Hybrid parallel method for multipoint geostatistical random simulation process

Technical Field

The invention relates to a hybrid parallel method for a multipoint geostatistical stochastic simulation process, and belongs to the field of geological modeling.

Background

The multi-point geostatistics method can represent the correlation among multiple points in space, and represents the complex pattern in a reference model (training image) in a mode of maximizing expectation and pattern learning, so that the anisotropic complex geological structure model of the underground space is automatically constructed, and the heterogeneity of the geological model attribute is better reproduced. The multi-point geostatistics method combines the advantages of an object-based and pixel-based random simulation method, becomes an important branch of the automatic construction and simulation field of a complex three-dimensional geological model, and has been well applied to a plurality of geological fields such as reservoir modeling, seismic inversion, mineral prediction and the like.

The multipoint geostatistics method aims at directly extracting an anisotropic space mode from a reference model (training image) to describe spatial heterogeneous geometric characteristics, and adopts a serial multipoint geostatistics random simulation method when a simulation grid is simulated by the current multipoint geostatistics method, and comprises the following steps: firstly, acquiring all nodes to be simulated in a grid, and determining a random simulation path; then, taking out a node to be simulated from the random simulation path each time, and then obtaining condition point data around the node to be simulated; using these condition points to match data events present in the training images, the values of the random variables can be modeled by pattern learning or maximizing expectations; until all nodes to be simulated in the random path are traversed, the simulation is finished; when the simulation grid is simulated by the serial multipoint geostatistical random simulation method, each node to be simulated in the simulation grid needs to be sequentially accessed according to a simulation path; the method has the advantages that a large amount of calculation consumption is caused, meanwhile, the calculation efficiency is low, the requirement for large-scale high-precision model construction in practical application is not met, meanwhile, the multipoint geostatistical random simulation method carries out random simulation according to Monte Carlo, the simulation results of all spatial positions in a simulation neighborhood can influence the simulation result of the current spatial position to be simulated, namely the spatial position being simulated is located in the simulation neighborhood of the current simulation task, the simulation result can influence the current simulation task, simulation conflict is caused, and the problem cannot be well solved by the conventional serial multipoint geostatistical random simulation method.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a hybrid parallel method oriented to a multipoint geostatistical random simulation process, the method is embedded into a hybrid parallel strategy, utilizes a coarse-grained parallel strategy and a fine-grained parallel strategy to change the original serial computation into parallel computation, simultaneously has the effect of respectively operating a plurality of processes on a plurality of nodes, and realizes the effect of process-level and thread-level parallel by the cooperation of the plurality of nodes so as to improve the computation efficiency and reduce the computation consumption. And the problem that simulation conflicts is solved, so that the accuracy of a simulation result is improved.

The technical scheme adopted by the invention for solving the technical problem is as follows: a hybrid parallel method facing to a multipoint geostatistics random simulation process divides a computing node into a main node and a subordinate processor according to a coarse-grained parallel strategy, the main node divides a required simulation task into a plurality of sub-simulation tasks according to a simulation path and obtains simulation information corresponding to the simulation sub-simulation tasks, after the simulation information is obtained, the main node finds the subordinate processor with the minimum computing load from a node load list and sends the simulation information and the corresponding sub-simulation tasks to the subordinate processor; and the slave processor performs simulation work on the sub-simulation tasks, returns the sub-simulation results to the master node after the slave processor completes the simulation work of the sub-simulation tasks, updates the node load list after the master node receives the sub-simulation results, and summarizes the sub-simulation results to obtain the simulation results after all the sub-simulation tasks are completed.

And synchronously dividing the simulation task into a plurality of sub-simulation tasks according to a fine-grained parallel strategy and sending the simulation information and the corresponding sub-simulation tasks to the slave processor.

The number of the sub-simulation tasks is larger than the number of the subordinate processors.

The simulation information comprises the mapping relation among simulation neighborhoods, simulation window sizes and simulation space positions of the sub-simulation tasks in the simulation space.

In the process of simulating the sub-simulation task, the slave processor performs a conflict handling policy, that is, determines a simulation position a1 in the simulation space, and handles the conflict handling policy according to the determination result, which is specifically as follows:

1) if the simulation position A1 is not in the simulation neighborhood of the sub simulation task in the simulation space, performing simulation work by using the simulation position A1;

2) if the simulation position is in the simulation neighborhood of the sub-simulation task in the simulation space, finding the next position to be simulated A2 of the simulation position A1 along the simulation path, and replacing the simulation position A1 with the position to be simulated A2 for re-judgment;

2.1) if the re-judgment times are within the threshold range and the position An to be simulated is not in the simulation neighborhood of the sub-simulation task in the simulation space, using the simulation position An to perform simulation work

2.2) if the re-judgment times are out of the threshold range, recording the simulation position As, if s is more than n, giving up the simulation work, recording the simulation position As, and after all the sub-simulation tasks are completed, independently simulating the sub-simulation tasks on the simulation position As.

A unified data conversion interface is arranged between the master node and the slave processor, the data conversion interface converts the mapping relation between the simulation space positions into a corresponding numerical value relation and vectorizes the numerical value relation, and corresponding information needs to be stripped layer by layer when a multi-mapping is vectorized.

The method comprises the following specific steps:

(1) dividing the computing nodes into a main node and a subordinate processor according to a coarse-grained parallel strategy, reading a training image, sample data and parameter information of an algorithm by the main node, loading grid data to be simulated, and distributing known sample data to a simulation grid according to the spatial position of the grid data;

(2) the master node sends the training images and the parameter information to each slave processor;

(3) the master node determines a random path sim _ path containing all the grids to be simulated;

(4) if the grid node to be simulated is the size _ path.size, namely the size _ path.size () >0, turning to the step (5), otherwise, turning to the step (15);

(5) the main node acquires a currently idle slave processor and selects a conflict-free node x to be simulated from the simulation path based on a conflict processing strategy;

(6) the main node takes the node x to be simulated as the center, and acquires the known grid node according to the given search neighborhood R to form the data event N of the simulation_x；

(7) The main node uses the data conversion interface to convert the space coordinates and data events N of the node x to be simulated_xConverting the analog information into a data stream D;

(8) the master node sends the data stream D to an idle slave processor;

(9) the slave processor analyzes the data stream D into simulation information of the node x to be simulated by using the data conversion interface;

(10) parallel strategy random scanning training image T based on space partition of slave processor_IFor each node y, a data event N is calculated_xAnd N_yDegree of difference d (N) therebetween_x,N_y)；

(11) When d (N)_x,N_y) When the difference is smaller than a preset difference metric threshold value t, executing the step (13), otherwise, turning to the step (12);

(12) when the proportion of the scanning training image TI is larger than a preset proportion threshold value f, executing the step (13), otherwise, executing the step (10);

(13) after the slave processor obtains the optimal matching node y, the value of the node y is used as the simulation result of the node x and is sent to the master node;

(14) filling the simulation result into a corresponding position in the simulation grid by the main node, removing the current simulation node x from the simulation path sim _ path, and turning to the step (4);

(15) saving the result and ending the simulation;

(16) the master node sends a termination simulation signal to the slave processor and ends the process;

(17) and the slave processor finishes the process after receiving the termination analog signal.

The conflict processing strategy in the step (5) comprises the following steps

(5.1) if the node to be simulated exists, namely, sim _ path. size () >0, and the number of conflicts is less than the set maximum upper limit, namely conflictsCnt < Max, then turning to the step (2), otherwise, turning to the step (5);

(5.2) determining a conflict range Re according to the condition points around the current node x to be simulated;

(5.3) if the node set S n Re being simulated is not an empty set, proving that the conflict occurs at the moment, and if the conflict times are added, turning to the step (4), otherwise, turning to the step (5);

(5.4) putting the node x to be simulated back into the simulation path, taking out another node to be simulated, and turning to the step (2);

and (5.5) the conflict processing is finished.

The spatial partition-based parallel strategy in the step (10) is to partition the training images TI to be scanned according to the number of threads in the thread pool; different scanning areas are distributed aiming at different threads; a plurality of threads simultaneously scan and compare data events; and when a certain thread obtains the best data event matching result, finishing the whole scanning comparison process through the inter-thread communication.

According to the technical scheme, the method comprises the following steps: (1) the method provided by the invention adopts a coarse-grained parallel strategy, the computing nodes are divided into the master nodes and the slave nodes by utilizing a master-slave structure, a plurality of sub-simulation tasks are respectively processed on the slave nodes, and one simulation task is completed by cooperation of a plurality of nodes, so that the simulation efficiency is improved, and meanwhile, a large number of nodes are ensured to be used for simulation to fully mobilize the computing power.

(2) The method provided by the invention adopts a fine-grained parallel strategy, so that a plurality of programs can run at the same time, even if instructions and operations run simultaneously, the waste of computing power is avoided.

(3) The method provided by the invention adopts a fine-grained parallel strategy, and carries out search area division on the pattern library in the pattern matching process during the multipoint geostatistical random simulation so as to fully utilize the thread resources in the computing nodes.

(4) In the method provided by the invention, the data conversion interface converts the analog information of various data structures into data streams in a multiple mapping mode in the process of data transmission between the master node and the slave node, so as to ensure the transmission efficiency of the analog information and avoid the problem of reduced parallel efficiency caused by multiple communication establishment during the transmission of the analog information.

(5) The method provided by the invention adopts a conflict processing strategy to ensure that the space position being simulated is positioned outside the simulation neighborhood of the current simulation task, so as to avoid the influence of the simulation result on the current simulation task.

Drawings

FIG. 1 is a general flow diagram of the present invention.

FIG. 2 is a flow chart of the conflict handling strategy of the present invention when a parallel conflict occurs.

Fig. 3 is a training image used in the two-dimensional simulation experimental case of the present invention.

Fig. 4 is a training image of the 3000 conditional data points used in fig. 3.

Fig. 5 is a two-dimensional river channel diagram simulated by the experiment of fig. 4.

Fig. 6 is a plot of the overall variation function of fig. 5 plotted against the two-dimensional simulation results.

FIG. 7 is a spatial connectivity curve in the X direction of the results of the implementation of FIG. 6 with reference to a two-dimensional simulation.

Fig. 8 is a training image used in the three-dimensional simulation experimental case of the present invention.

Fig. 9 is 100 borehole data used in fig. 8.

Fig. 10 is a simulation result of the three-dimensional experimental case output of fig. 9.

Fig. 11 is a graph of the overall variation function plotted against the three-dimensional simulation results of fig. 10.

FIG. 12 is a spatial connectivity curve in the X direction of the results of the implementation of the reference three-dimensional simulation of FIG. 11.

FIG. 13 is a spatial connectivity curve in the Y-direction of the implementation of FIG. 11 with reference to three-dimensional simulations.

Fig. 14 is a variation function curve of the method adopted in the two-dimensional simulation experiment, the serial algorithm and the parallel algorithm.

Fig. 15 is a variation function curve of the method adopted in the three-dimensional simulation experiment, the serial algorithm and the parallel algorithm.

Fig. 16 is a spatial connectivity curve of the method, the serial algorithm and the parallel algorithm in the X direction, which are adopted in the two-dimensional simulation experiment.

Fig. 17 is a spatial connectivity curve of the method, the serial algorithm and the parallel algorithm in the X direction, which are adopted in the three-dimensional simulation experiment.

FIG. 18 is a parallel simulation experiment diagram demonstrating the parallel efficiency design of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings and examples, and the present invention is not limited to the examples.

Referring to fig. 1, a hybrid parallel method for a multi-point geostatistical stochastic simulation process divides computing nodes into a master node and slave processors according to a coarse-grained parallel policy, and generally, a node 0 in a computing cluster is used as the master node, and the other nodes are the slave processors. The main node divides the simulation task into a plurality of sub-simulation tasks according to the simulation path, the number of the sub-simulation tasks is larger than the number of the subordinate processors, and obtains simulation information corresponding to the simulation sub-simulation tasks, the simulation information comprises the mapping relation among simulation neighborhoods, simulation window sizes and simulation space positions of the sub-simulation tasks in a simulation space, after the simulation information is obtained, the main node finds the subordinate processor with the minimum calculation load from a node load list, sends the simulation information and the corresponding sub-simulation tasks to the subordinate processors, and synchronously divides the simulation tasks into a plurality of sub-simulation tasks according to a fine-grained parallel strategy and sends the simulation information and the corresponding sub-simulation tasks to the subordinate processors; and the slave processor performs simulation work on the sub-simulation tasks, returns the sub-simulation results to the master node after the slave processor completes the simulation work of the sub-simulation tasks, updates the node load list after the master node receives the sub-simulation results, and summarizes the sub-simulation results to obtain the simulation results after all the sub-simulation tasks are completed.

Referring to fig. 2, during the process of simulating the sub-simulation task, the slave processor performs a conflict handling policy, that is, determines a simulation position a1 in the simulation space, and processes the conflict handling policy according to the determination result, which is specifically as follows:

The method comprises the following specific steps:

(6) the main node takes the node x to be simulated as the center, and acquires the known grid node according to the given search neighborhood R to form the simulated data event N_x；

(8) the master node sends the data stream D to an idle slave processor;

(15) saving the result and ending the simulation;

The conflict processing strategy in the step (5) comprises the following steps

(5.3) if the node set S.andgate.Re being simulated is not an empty set, proving that the conflict occurs at the moment, and automatically adding the number of conflicts and turning to the step (4), otherwise turning to the step (5);

and (5.5) the conflict processing is finished.

In order to illustrate the effectiveness of the parallel scheme provided by the invention, a two-dimensional simulation experiment, a three-dimensional simulation experiment and a parallel scheme effectiveness experiment are respectively implemented according to the steps.

Referring to fig. 3 to 7, fig. 3 is a training image used in a two-dimensional simulation experiment, where the training image is a 250 × 250 channel distribution grid; shown in FIG. 4 are 3000 randomly generated condition data points for simulation; FIG. 5 is a 2000 × 2000 implementation with reference to spatial patterns and condition data points in a two-dimensional training image. As can be seen from the two-dimensional simulation realization result, the simulated riverway texture is clear, and the distribution is similar to that of the training image. Therefore, the river channel distribution condition can be well simulated by the algorithm provided by the invention. Fig. 6 is a variation function curve drawn by 10 different simulation results simulated by a two-dimensional simulation experiment together with a training image, and fig. 7 is a connectivity curve in the X direction corresponding to the 10 different simulation results and the training image. As can be seen from both the variation function graph and the connectivity graph, the variation function curve (gray) drawn according to the simulation implementation result is intensively distributed around the corresponding curve (black) of the training image, and thus, as can be seen from the statistical characteristics, all of the 10 different simulation implementation results can be close to the variation characteristic and the connectivity characteristic which conform to the two-dimensional training image.

Referring to fig. 8 to 13, fig. 8 is a 180 × 150 × 120 training image used in a three-dimensional simulation experiment; shown in FIG. 9 are 100 boreholes extracted from a three-dimensional training image for use as conditioning data; fig. 10 shows the simulated three-dimensional implementation. As can be seen from the three-dimensional simulation realization result, the simulation realization result of the algorithm provided by the invention is very close to the distribution mode of the rock stratum in the training image; fig. 11 to 13 are respectively a variation function curve and a connectivity curve which are drawn together by 10 different simulation results and training images. It can be seen from the variation function curve and the connectivity curve that the 10 simulation results all conform to the variation distribution and the connectivity distribution in the training image.

Referring to fig. 14 to 17, in order to deeply reveal the difference between the simulation result obtained by the present invention and the simulation result of the serial algorithm, the comparison results of the variation function and the connectivity function between the simulation implementations of the serial and parallel algorithms in the two-dimensional and three-dimensional experimental cases are plotted in fig. 14 to 17. It can be observed that the degradation function and connectivity curve output using the present scheme are highly consistent with the tandem algorithm. Further proves that the scheme can not influence the effect of analog output.

Fig. 18 is a parallel experimental case designed by the present invention. In the parallel experimental case, the number of the computing cores is gradually increased from 36 to 96, and in the process, the parallel efficiency (line with the starting point of 100.0) and the speed-up ratio (line with the starting point of 1.00) of the parallel scheme are not reduced due to the increase of the number of the computing cores, so that the parallel scheme designed by the invention has high parallelism and expandability.

The experimental cases show that the parallel scheme provided by the invention can efficiently realize attribute proportion reproduction, spatial variability characterization and spatial structure connectivity reconstruction, and the mixed parallel scheme facing the multipoint geostatistical random simulation process provided by the invention has the capability of efficiently and excellently simulating the geosynchronous structure.

The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes or modifications without departing from the spirit and scope of the present invention, and therefore all equivalent technical solutions are within the scope of the present invention.

Claims

1. A hybrid parallel method for a multipoint geostatistical stochastic simulation process is characterized by comprising the following steps: dividing the computing nodes into a main node and a subordinate processor according to a coarse-grained parallel strategy, dividing a simulation task into a plurality of sub-simulation tasks by the main node according to a simulation path, acquiring simulation information corresponding to the simulation sub-simulation tasks, finding the subordinate processor with the minimum computing load from a node load list by the main node after acquiring the simulation information, and sending the simulation information and the corresponding sub-simulation tasks to the subordinate processor; the slave processor carries out simulation work on the sub-simulation tasks, when the slave processor finishes the simulation work of the sub-simulation tasks, the sub-simulation results are returned to the master node, the master node updates the node load list after receiving the sub-simulation results, and when all the sub-simulation tasks are finished, the sub-simulation results are summarized to obtain the simulation results;

the method comprises the following specific steps:

(8) the master node sends the data stream D to an idle slave processor;

(10) the slave processor randomly scans the training image TI based on the parallel strategy of the space partition, and calculates a data event N for each node y_xAnd N_yDegree of difference d (N) therebetween_x, N_y)；

(11) When d (N)_x, N_y) When the difference is smaller than a preset difference metric threshold value t, executing the step (13), otherwise, turning to the step (12);

(15) saving the result and ending the simulation;

2. The hybrid parallel method for multi-point geostatistical stochastic simulation processes according to claim 1, wherein: and synchronously dividing the simulation task into a plurality of sub-simulation tasks according to a fine-grained parallel strategy and sending the simulation information and the corresponding sub-simulation tasks to the slave processor.

3. The hybrid parallel method for multi-point geostatistical stochastic simulation processes according to claim 1, wherein: the number of the sub-simulation tasks is larger than the number of the subordinate processors.

4. The hybrid parallelization method for a multipoint-oriented geostatistical stochastic simulation process of claim 1, characterized in that: the simulation information comprises the mapping relation among simulation neighborhoods, simulation window sizes and simulation space positions of the sub-simulation tasks in the simulation space.

5. The hybrid parallel method for multi-point geostatistical stochastic simulation processes of claim 4, wherein: during the process that the slave processor simulates the sub-simulation task, a conflict processing strategy is carried out, namely the simulation position A of the slave processor in the simulation space is simulated₁And judging and processing according to the judgment result, wherein the method specifically comprises the following steps:

2) if the simulation position is in the simulation neighborhood of the sub-simulation task in the simulation space, finding the next position to be simulated A2 of the simulation position A1 along the simulation path, and replacing the simulation position A1 with the position to be simulated A2 for judging again;

6. The hybrid parallel method for multi-point geostatistical stochastic simulation processes of claim 4, wherein: a unified data conversion interface is arranged between the master node and the slave processor, the data conversion interface converts the mapping relation between the simulation space positions into a corresponding numerical value relation and vectorizes the numerical value relation, and corresponding information needs to be stripped layer by layer when a multi-mapping is vectorized.

7. The hybrid parallel method for the multipoint-oriented geostatistical stochastic simulation process according to claim 1, wherein the collision handling strategy in the step (5) comprises the following steps:

(5.1) if the node to be simulated exists, namely, sim _ path. size () >0, and the number of conflicts is less than the set maximum upper limit, namely conflictsCnt < Max, then turning to the step (5.2), otherwise, turning to the step (5.5);

(5.3) if the node set S n Re being simulated is not an empty set, the conflict is proved to occur at the moment, the number of conflicts is added and the step (5.4) is carried out, otherwise the step (5.5) is carried out;

(5.4) putting the node x to be simulated back into the simulation path, taking out another node to be simulated, and turning to the step (5.2);

and (5.5) the conflict processing is finished.

8. The hybrid parallel method for multi-point geostatistical stochastic simulation processes according to claim 1, wherein: the parallel strategy based on the spatial partition in the step (10) is to partition the training image TI needing to be scanned according to the number of threads in the thread pool; different scanning areas are distributed aiming at different threads; a plurality of threads simultaneously scan and compare data events; and when a certain thread obtains the best data event matching result, finishing the whole scanning comparison process through the inter-thread communication.