CN115455794B - LBM parallel optimization method, device and storage medium based on communication pore division calculation region - Google Patents

LBM parallel optimization method, device and storage medium based on communication pore division calculation region Download PDF

Info

Publication number
CN115455794B
CN115455794B CN202210953478.0A CN202210953478A CN115455794B CN 115455794 B CN115455794 B CN 115455794B CN 202210953478 A CN202210953478 A CN 202210953478A CN 115455794 B CN115455794 B CN 115455794B
Authority
CN
China
Prior art keywords
pore
units
unit
calculation
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210953478.0A
Other languages
Chinese (zh)
Other versions
CN115455794A (en
Inventor
胡五龙
许铭扬
吴卫国
李凡
肖一鹤
蒋张泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210953478.0A priority Critical patent/CN115455794B/en
Publication of CN115455794A publication Critical patent/CN115455794A/en
Application granted granted Critical
Publication of CN115455794B publication Critical patent/CN115455794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/25Design optimisation, verification or simulation using particle-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an LBM parallel optimization method and device for dividing calculation areas based on communication pores, which can divide the number of subdomains uniformly according to the number of calculation nodes, so that each calculation node can be processed with high efficiency in a load balancing manner, and the calculation processing efficiency is improved. The method comprises the following steps: step 1, determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system; step 2, decomposing a calculation domain: dividing the basin into n along the x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Obtaining N subdomains with the same or similar unit numbers, wherein each subdomain is a three-dimensional area formed by a plurality of pore units; and, the difference of the number of units between the sub-field with the maximum number of units and the sub-field with the minimum number of units after decomposition should not exceed one thousandth of the total number of units; and step 3, distributing calculation tasks.

Description

LBM parallel optimization method, device and storage medium based on communication pore division calculation region
Technical Field
The invention belongs to the technical field of pore structure simulation calculation, and particularly relates to an LBM parallel optimization method, an LBM parallel optimization device and a storage medium based on a communication pore division calculation region.
Background
The lattice boltzmann method (Lattice Boltzmann Method abbreviated LBM) is an effective means of simulating fluid flow in porous media pore structures at mesoscale. In order for the results of the simulation to be representative, the dimensions of the simulated sample should be large enough and, due to the complexity of the fluid flow itself in the pore structure, the pore scale LBM tends to face significant computational resource and storage space requirements in numerical simulation. Thus, large-scale LBM simulation requires parallel optimization, and its execution time and memory requirements depend on the data size and parallel optimization method.
The basic idea of domain decomposition for current parallel computing is to decompose the computing domain into several subfields, then restore the interfaces and perform load balancing. For complex porous materials, however, this step often takes a significant amount of time to iterate to achieve acceptable load balancing. Wherein the recursive dichotomy (Recursive Bisection Method) is the most commonly used computational domain decomposition scheme, which uniformly divides the computational domain into two subfields, then equally divides the subfields, and finally co-divides the subfields into 2 by R times of recursion R The subfields. The partitioning scheme can well balance the workload of a uniform medium regular or irregular structure, but has the following three problems: 1) Good workload balancing is still difficult to achieve for heterogeneous, highly heterogeneous porous structures; 2) The number of division subfields must be 2 R The number of processors (compute nodes) is not exactly equal to the number of processors, so that part of the processors cannot be effectively utilized; 3) When the divided subfields are more, the number of times R of repeating the recursive division is larger and the efficiency is low.
Disclosure of Invention
The invention aims to solve the problems, and aims to provide an LBM parallel optimization method, an LBM parallel optimization device and a storage medium for dividing a calculation area based on communication pores, which can divide the number of subdomains uniformly according to the number of calculation nodes, so that each calculation node can be subjected to load balancing and high-efficiency processing, and the calculation processing speed is improved.
In order to achieve the above object, the present invention adopts the following scheme:
< method >
The invention provides an LBM parallel optimization method based on a communication pore dividing calculation region, which is characterized by comprising the following steps:
step 1, determining pore unit information and communication conditions of a sample according to pore distribution data of the porous medium sample to be simulated; determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system;
step 2, decomposing a calculation domain:
dividing the basin into n along the x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Secondary, get n x ×n y ×n z N subfields having the same or similar cell numbers, each subfield being a three-dimensional region composed of a plurality of pore cells; and after decomposition has the maximum number of units M max Subdomains and having a minimum number of units M min The difference in the number of units between subfields should not exceed one thousandth of the total number of units; the allowable difference value of the number of units among the subdomains can be further reduced according to the requirement of the division precision of the calculation domain;
step 3, distributing calculation tasks:
during parallel computing, N sub-domains are allocated to N computing nodes one by one for processing; the computing node considers only all connected pore units when processing the sub-domains, renumbers the connected pore units, and then stores the connected pore units in a one-dimensional array p respectively i In the formula, i is the number of the subdomain; each communicated pore unit is associated with one coordinate and is stored according to the sequence of the one-dimensional array to obtain a pore unit array storing the ordinal numbers of the pore units and the corresponding coordinates, and each pore unit is tracked through the unique ordinal numbers and the unique coordinates;
for each void cell: storing each function in the main data structure of the pore unit and the ordinal number of the pore unit as a one-dimensional array to obtain a series of first derivative arrays; and storing the momentum and local fluid density data of the pore unit and the ordinal number of the pore unit into a one-dimensional array to obtain a series of second derivative arrays.
Specifically, the LBM parallel optimization method based on the communication pore dividing calculation region provided by the invention can also have the following characteristics: in step 2, the maximum number of units M max
Minimum number of units M min
Where NX, NY, NZ are the total number of units of the interconnected pores in the x, y, z axis directions of the simulation areas, respectively.
Preferably, the LBM parallel optimization method based on the connected pore dividing calculation region provided by the invention can also have the following characteristics: in step 3, the primary data structure of each pore unit is its fluid particle distribution function and the equilibrium fluid particle distribution function; and storing the momentum and local fluid density data before and after calculation of the pore unit and the ordinal number of the pore unit into a one-dimensional array correspondingly to obtain four second derivative arrays.
Preferably, the LBM parallel optimization method based on the connected pore dividing calculation region provided by the invention further comprises the following steps: step 4, setting a communication mode among subdomains: when processing adjacent or diagonal pore units between different subdomains, an interface layer on the interface of the current subdomain is additionally expanded by one layer of units to cover the adjacent and diagonal pore units, so that the communication mode of the expanded pore units is consistent with that of the interface units in the current subdomain.
< device >
Further, the present invention also provides an apparatus for automatically implementing the above < method >, which is characterized by comprising:
a determining part for determining pore unit information and communication condition of the sample according to the pore distribution data of the porous medium sample to be simulated; determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system;
a calculation domain decomposition unit for dividing the basin into n along the x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Secondary, get n x ×n y ×n z N subfields with the same or similar number of cells; the decomposed subfields should satisfy the condition: with a maximum number of units M max Subdomains and having a minimum number of units M min The difference in the number of units between subfields should not exceed one thousandth of the total number of units;
a calculation task allocation unit which allocates N sub-domains to N calculation nodes one by one for processing and performs parallel calculation; the computing node considers only all connected pore units when processing the sub-domains, renumbers the connected pore units, and then stores the connected pore units in a one-dimensional array p respectively i In the formula, i is the number of the subdomain; each communicated pore unit is associated with one coordinate and is stored according to the sequence of the one-dimensional array to obtain a pore unit array storing the ordinal number of the pore unit and the corresponding coordinate; for each void cell: storing each function in the main data structure of the pore unit and the ordinal number of the pore unit as a one-dimensional array to obtain a series of first derivative arrays; storing the momentum and local fluid density data of the pore unit and the ordinal number of the pore unit as a one-dimensional array correspondingly to obtain a series of second derivative arrays;
and the control part is in communication connection with the determining part, the computing domain decomposing part and the computing task distributing part and controls the operation of the determining part, the computing domain decomposing part and the computing task distributing part.
Preferably, the LBM parallel optimization device based on the connected pore dividing calculation region provided by the invention can also have the following characteristics: the maximum number of units M in the domain decomposition part is calculated max
Minimum number of units M min
Where NX, NY, NZ are the total number of units of the interconnected pores in the x, y, z axis directions of the simulation areas, respectively.
Preferably, the LBM parallel optimization device based on the connected pore dividing calculation region provided by the invention further comprises: and the communication mode setting part is in communication connection with the control part, and when adjacent or diagonal pore units among different subfields are processed, an interface layer on the interface of the current subfield is additionally expanded by one layer of units to cover the adjacent and diagonal pore units, so that the communication modes of the expanded pore units are consistent with those of the interface units in the current subfield.
Preferably, the LBM parallel optimization device based on the connected pore dividing calculation region provided by the invention further comprises: and the input display part is communicated with the determining part, the calculation domain decomposition part, the calculation task allocation part, the communication mode setting part and the control part and is used for enabling a user to input an operation instruction and correspondingly display the operation instruction.
Preferably, the LBM parallel optimization device based on the connected pore dividing calculation region provided by the invention can also have the following characteristics: the input display unit can display the sample pore unit information and the communication condition determined by the determination unit and the number of calculation nodes or the total number of sub-fields N according to the operation instruction, display the decomposition condition of the calculation domain decomposition unit and correspondingly display the distribution condition of the calculation task distribution unit.
< storage Medium >
In addition, the present invention also provides a computer-readable storage medium storing a program for realizing the above < method >.
Effects and effects of the invention
The LBM parallel optimization method, the LBM parallel optimization device and the storage medium based on the communication pore division calculation region can divide the total number of subdomains according to the number of calculation nodes in a random and balanced modeNot limited to 2 R The method and the system can effectively reduce the communication complexity caused by the tortuosity of the division interface, balance the workload of all subdomains in the watershed decomposition process, avoid secondary optimization such as iteration and the like, reduce the memory consumption of the system, simplify the calculation task distribution process, and ensure that the communication among subdomains is efficient and easy to realize, and reduce the communication time; therefore, the method can remarkably improve the calculation efficiency, is particularly suitable for the treatment of large calculation amount of pore scale simulation and large memory consumption of the porous material, develops a new calculation region decomposition and treatment thought, and has great popularization value.
Drawings
FIG. 1 is a schematic diagram of a computational domain decomposition process in an embodiment of the present invention, wherein (a) is decomposed along the x-axis, (b) is decomposed along the y-axis, and (c) is decomposed along the z-axis;
FIG. 2 is a schematic diagram of a process of marking fluid cells in a porous medium with a one-dimensional array after decomposing to obtain subfields in an embodiment of the present invention;
FIG. 3 is a diagram of embodiment 0 of the present invention # Subdomain and 1 # Schematic interface of subdomain (white is pore unit, grey is solid unit);
FIG. 4 is a graph of a D3Q19 discrete velocity model in accordance with an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a process of exchanging information between diagonal units among different subfields according to an embodiment of the present invention.
Detailed Description
The LBM parallel optimization method, the LBM parallel optimization device and the LBM storage medium based on the communication pore division calculation area related to the invention are explained in detail below with reference to the attached drawings.
< example >
The LBM parallel optimization method based on the connected pore dividing calculation region provided by the embodiment comprises the following steps:
step 1, determining pore unit information and communication conditions of a sample according to pore distribution data of the porous medium sample to be simulated; determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system; for a porous material with uniform and simple structure distribution, the pore distribution data can be directly defined by a programming language, and for a non-uniform and highly heterogeneous porous structure, the pore distribution data can be obtained by adopting an X-CT technology.
Step 2, decomposing a calculation domain:
dividing porous media sample flow field into n along x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Secondary, get n x ×n y ×n z N subfields with the same or similar number of cells; and after decomposition has the maximum number of units M max Subdomains and having a minimum number of units M min The number of cells between subfields should not differ by more than one thousandth of the total number of cells.
Maximum number of units M max
Minimum number of units M min
Where NX, NY, NZ are the total number of units of the interconnected pores in the x, y, z axis directions of the simulation areas, respectively.
Specifically, as shown in fig. 1, each cell within the calculation region is first traversed by successive scans in the z, y, and x directions. It should be noted that the scan order of z→y→x is at x i The cross-section is traversed in a z-then-y order. Fig. 1 (b) is a view showing that on the basis of fig. 1 (a), each sub-field is divided into 2 small areas along the y-axis, specifically, the pore units of the sub-fields are numbered in the order of x→z→y and stored in a one-dimensional array, and then the one-dimensional array associated with each sub-field is divided into two parts, at this time, the drainage field is divided into 3×2×1 sub-areas. FIG. 1 (c) shows the same procedure as x.fwdarw.The y- > z scan sequence traverses the aperture cells of each subfield and repeats the division process, with the final simulation area being divided into 3 x 2 subfields. Next, as shown in fig. 2, the connected pore units are screened and stored in a one-dimensional array, and there are NX pore units in the x-axis direction in the drainage basin. As shown in fig. 3, all connected pore units are divided into a plurality of equal-sized groups, each group being associated with one subdomain, and the total number of groups is equal to the number of subdomains after decomposition. Then, the one-dimensional group is divided into three subgroups, each subgroup containing NX/3 or NX/3+1 pore units, the last unit number of the three subgroups being N respectively 0 、N 1 And NX. Finally, the interface between two adjacent subfields is restored, as shown in fig. 3 and 1 (a), and the interface includes the last pore unit of each partition.
The calculation domain decomposition scheme divides the soil sample into n directly x ×n y ×n z The number of subfields, the difference in workload between subfields, is the difference in the number of pore units at adjacent interfaces between each subfield. The subsequent simulation is based on the divided one-dimensional array, and the simulation area is from 1 to nx i ,nx i Is the size of the subfields in the x-direction (fluid flow direction) associated with processor i.
Step 3, distributing calculation tasks:
during parallel computing, N sub-domains are allocated to N computing nodes one by one for processing; the computing node considers only all connected pore units when processing the sub-domains, renumbers the connected pore units, and then stores the connected pore units in a one-dimensional array p respectively i In the formula, i is the number of the subdomain; each connected pore unit is associated with a coordinate and stored in sequence according to a one-dimensional array to obtain an array of pore units storing the ordinal numbers of the pore units and the corresponding coordinates, and each pore unit is tracked by the unique ordinal numbers and coordinates.
For each void cell: storing each function in the main data structure of the pore unit and the ordinal number of the pore unit as a one-dimensional array to obtain a series of first derivative arrays; and storing the momentum and local fluid density data of the pore unit and the ordinal number of the pore unit into a one-dimensional array to obtain a series of second derivative arrays.
The primary data structure of each pore unit is its fluid particle distribution function and equilibrium fluid particle distribution function; and storing the momentum and local fluid density data before and after calculation of the pore unit and the ordinal number of the pore unit into a one-dimensional array correspondingly to obtain four second derivative arrays.
As shown in fig. 4, for each connected pore unit x, there are up to 18 adjacent pore units stored in 18 one-dimensional arrays according to the D3Q19 model in LBM. Each array corresponds to a component. The primary data structure of each pore cell is its 19 fluid particle distribution functions and 18 equilibrium fluid particle distribution functions, each described by one-dimensional array, the other four one-dimensional arrays for storing momentum and local fluid density in the pore cell.
Step 4, setting a communication (extraction and calling of calculation data) mode among subdomains: as shown in fig. 5, when adjacent or diagonal pore units (d and d' in the figure) between different subfields are processed, the interface layer on the interface of the current subfield is additionally expanded by one layer of units to cover the adjacent and diagonal pore units, so that the communication mode of the expanded pore units is consistent with that of the interface units in the current subfield.
In each sub-field of the present implementation, the overlap region is stored in 18 one-dimensional arrays, and no redundant or duplicate calculations are performed. Each subfield is from 1 to nx i The particle distribution function and the equilibrium distribution function of the fluid cells of (a) are stored in two different one-dimensional arrays (18 one-dimensional arrays per set). A pointer is selected to locate the first aperture element of the interface and only the aperture element data in the interface layer is swapped to the adjacent partition. Since only all the aperture cells including the interface layer in each sub-field are stored and the adjacent cells of each aperture cell are stored in 18 one-dimensional arrays, the adjacent cells of each aperture cell can be easily determined in the simulation. Each iteration, fluid particle distribution functions of corresponding subfields are respectively calculated on each processor, and then data of the subfield interface are exchanged between the processors. In data exchangeIn this way, only 5 of the 18 distribution function components of the fluid element on the interface layer need be transferred to the adjacent processor. For example, in the x-direction, only the interface nx is required i Fluid particle distribution function f 3 [i]、f 8 [i]、f 9 [i]、f 12 [i]And f 13 [i]The left sub-field is transferred from the right sub-field, and likewise only f is required 1 [i]、f 7 [i]、f 10 [i]、f 11 [i]And f 14 [i]From the left subdomain to the right subdomain.
According to the method, the total number of the subfields can be divided according to the number of the calculation nodes in a random and balanced manner, all calculation nodes are fully utilized for calculation processing, the communication complexity caused by the tortuosity of a division interface can be effectively reduced, the workload of all subfields is balanced in the process of decomposing the river basin, secondary optimization such as iteration is not needed, the memory consumption of a system is reduced, the calculation task distribution process is simplified, the communication time among the subfields is effectively reduced, and therefore the calculation efficiency can be remarkably improved, the larger the data volume is, the more obvious the advantages of the method are, and the method is particularly suitable for processing with large calculation amount of pore scale simulation and large memory consumption of porous materials.
Further, the embodiment also provides a device capable of automatically implementing the method, which comprises a determining part, a calculation domain decomposing part, a calculation task distributing part, a communication mode setting part, an input display part and a control part.
The determining part determines pore unit information and communication conditions of the sample according to pore distribution data of the porous medium sample to be simulated; and determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system.
The computational domain decomposer divides the porous medium sample flow domain into n along the x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Secondary, get n x ×n y ×n z =n have the same or similarA subdomain of the number of cells; the decomposed subfields should satisfy the condition: with a maximum number of units M max Subdomains and having a minimum number of units M min The number of cells between subfields should not differ by more than one thousandth of the total number of cells.
Maximum number of units M max
Minimum number of units M min
Where NX, NY, NZ are the total number of units of the interconnected pores in the x, y, z axis directions of the simulation areas, respectively.
The computing task allocation part allocates the N sub-domains to N computing nodes one by one for processing and performs parallel computing; the computing node considers only all connected pore units when processing the sub-domains, renumbers the connected pore units, and then stores the connected pore units in a one-dimensional array p respectively i In the formula, i is the number of the subdomain; each communicated pore unit is associated with one coordinate and is stored according to the sequence of the one-dimensional array to obtain a pore unit array storing the ordinal number of the pore unit and the corresponding coordinate; for each void cell: storing each function in the main data structure of the pore unit and the ordinal number of the pore unit as a one-dimensional array to obtain a series of first derivative arrays; and storing the momentum and local fluid density data of the pore unit and the ordinal number of the pore unit into a one-dimensional array to obtain a series of second derivative arrays.
The communication mode setting part is in communication connection with the control part, and when adjacent or diagonal pore units among different subfields are processed, an interface layer on the interface of the current subfield is additionally expanded by one layer of units to cover the adjacent and diagonal pore units, so that the communication modes of the expanded pore units are consistent with the interface units in the current subfield.
The input display part is used for enabling a user to input an operation instruction and correspondingly display the operation instruction. For example, the input display unit may display the sample pore unit information and the connection condition determined by the determining unit, and the number of calculation nodes or the total number of subfields N, display the decomposition condition of the calculation domain decomposition unit, and display the distribution condition of the calculation task distribution unit, respectively, in accordance with the operation instruction.
The control unit is in communication with the determination unit, the calculation domain decomposition unit, the calculation task allocation unit, and the communication scheme setting unit, and controls the operation of the determination unit, the calculation domain decomposition unit, and the calculation task allocation unit.
The above embodiments are merely illustrative of the technical solutions of the present invention. The LBM parallel optimization method, apparatus and storage medium based on the connected pore dividing calculation region according to the present invention are not limited to the description in the above embodiments, but the scope defined by the claims. Any modifications, additions or equivalent substitutions made by those skilled in the art based on this embodiment are within the scope of the invention as claimed in the claims.

Claims (8)

1. The LBM parallel optimization method for dividing the calculation area based on the communication pores is characterized by comprising the following steps:
step 1, determining pore unit information and communication conditions of a sample according to pore distribution data of the porous medium sample to be simulated; determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system;
step 2, decomposing a calculation domain:
dividing porous media sample flow field into n along x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Secondary, get n x ×n y ×n z N subfields with the same or similar cell numbers, each subfield being a three-dimensional region composed of a plurality of connected pore cells; and after decomposition has the maximum number of units M max Is a son of (2)Domain sum having minimum number of units M min The difference in the number of units between subfields should not exceed one thousandth of the total number of units;
step 3, distributing calculation tasks:
during parallel computing, N sub-domains are allocated to N computing nodes one by one for processing; the computing node considers only all connected pore units when processing the sub-domains, renumbers the connected pore units, and then stores the connected pore units in a one-dimensional array p respectively i In the formula, i is the number of the subdomain; each communicated pore unit is associated with one coordinate and is stored according to the sequence of the one-dimensional array to obtain a pore unit array storing the ordinal numbers of the pore units and the corresponding coordinates, and each pore unit is tracked through the unique ordinal numbers and the unique coordinates;
for each void cell: storing each function in the main data structure of the pore unit and the ordinal number of the pore unit as a one-dimensional array correspondingly to obtain a series of first derivative arrays; and storing the momentum and local fluid density data of the pore unit and the ordinal number of the pore unit into a one-dimensional array correspondingly to obtain a series of second derivative arrays.
2. The LBM parallel optimization method based on the connected pore dividing calculation region according to claim 1, wherein:
wherein in step 3, the primary data structure of each pore unit is its fluid particle distribution function and equilibrium fluid particle distribution function; and storing the momentum and local fluid density data before and after calculation of the pore unit and the ordinal number of the pore unit into a one-dimensional array correspondingly to obtain four second derivative arrays.
3. The LBM parallel optimization method based on the connected pore dividing calculation region according to claim 1, further comprising:
step 4, setting a communication mode among subdomains: when processing adjacent or diagonal pore units between different subdomains, an interface layer on the interface of the current subdomain is additionally expanded by one layer of units to cover the adjacent and diagonal pore units, so that the communication mode of the expanded pore units is consistent with that of the interface units in the current subdomain.
4. LBM parallel optimization device based on communication hole divides calculation region, characterized by including:
a determining part for determining pore unit information and communication condition of the sample according to the pore distribution data of the porous medium sample to be simulated; determining the total number N of subdomains to be decomposed according to the number N of the computing nodes in the system;
a calculation domain decomposition unit for dividing the porous medium sample flow domain into n along the x-axis x Each region having the same or similar number of cells is then divided into n along the y-axis y A sub-area with the same or similar unit number, and finally, n x ×n y The sub-region divides n along the z-axis z Secondary, get n x ×n y ×n z N subfields with the same or similar cell numbers, each subfield being a three-dimensional region composed of a plurality of connected pore cells; the decomposed subfields should satisfy the condition: with a maximum number of units M max Subdomains and having a minimum number of units M min The difference in the number of units between subfields should not exceed one thousandth of the total number of units;
a calculation task allocation unit which allocates N sub-domains to N calculation nodes one by one for processing and performs parallel calculation; the computing node considers only all connected pore units when processing the sub-domains, renumbers the connected pore units, and then stores the connected pore units in a one-dimensional array p respectively i In the formula, i is the number of the subdomain; each communicated pore unit is associated with one coordinate and is stored according to the sequence of the one-dimensional array to obtain a pore unit array storing the ordinal number of the pore unit and the corresponding coordinate; for each void cell: storing each function in the main data structure of the pore unit and the ordinal number of the pore unit as a one-dimensional array correspondingly to obtain a series of first derivative arrays; storing the momentum and local fluid density data of the pore unit and the ordinal number of the pore unit as a one-dimensional array to obtain a series of second derivative arrays;
And a control part which is communicated with the determining part, the computing domain decomposing part and the computing task distributing part and controls the operation of the determining part, the computing domain decomposing part and the computing task distributing part.
5. The LBM parallel optimization device based on the connected pore dividing calculation region according to claim 4, further comprising:
and the communication mode setting part is in communication connection with the control part, and when adjacent or diagonal pore units among different subfields are processed, an interface layer on the interface of the current subfield is additionally expanded by one layer of units to cover the adjacent and diagonal pore units, so that the communication modes of the expanded pore units are consistent with those of the interface units in the current subfield.
6. The LBM parallel optimization device based on connected pore dividing calculation region according to claim 5, further comprising:
and an input display unit which is communicatively connected to the determination unit, the calculation domain decomposition unit, the calculation task allocation unit, the communication mode setting unit, and the control unit, and is configured to allow a user to input an operation instruction and to display the operation instruction accordingly.
7. The LBM parallel optimization device based on connected pore dividing calculation region according to claim 6, wherein:
the input display part can display the sample pore unit information and the communication condition determined by the determining part and the calculated node number or total number N of the subdomains according to the operation instruction, display the decomposition condition of the calculated domain decomposition part and correspondingly display the distribution condition of the calculated task distribution part.
8. A storage medium, characterized in that:
a program for implementing the LBM parallel optimization method based on the connected pore dividing calculation region as claimed in any one of claims 1 to 3 is stored.
CN202210953478.0A 2022-08-10 2022-08-10 LBM parallel optimization method, device and storage medium based on communication pore division calculation region Active CN115455794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210953478.0A CN115455794B (en) 2022-08-10 2022-08-10 LBM parallel optimization method, device and storage medium based on communication pore division calculation region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210953478.0A CN115455794B (en) 2022-08-10 2022-08-10 LBM parallel optimization method, device and storage medium based on communication pore division calculation region

Publications (2)

Publication Number Publication Date
CN115455794A CN115455794A (en) 2022-12-09
CN115455794B true CN115455794B (en) 2024-03-29

Family

ID=84297534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210953478.0A Active CN115455794B (en) 2022-08-10 2022-08-10 LBM parallel optimization method, device and storage medium based on communication pore division calculation region

Country Status (1)

Country Link
CN (1) CN115455794B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376481A (en) * 2018-08-16 2019-02-22 清能艾科(深圳)能源技术有限公司 Calculation method, device and the computer equipment of digital cores phase percolation curve based on more GPU
CN112949112A (en) * 2021-01-29 2021-06-11 中国石油大学(华东) Rotor-sliding bearing system lubrication basin dynamic grid parallel computing method
CN112992294A (en) * 2021-04-19 2021-06-18 中国空气动力研究与发展中心计算空气动力研究所 Porous medium LBM calculation grid generation method
CN114565658A (en) * 2022-01-14 2022-05-31 武汉理工大学 Pore size calculation method and device based on CT technology

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8583411B2 (en) * 2011-01-10 2013-11-12 Saudi Arabian Oil Company Scalable simulation of multiphase flow in a fractured subterranean reservoir as multiple interacting continua

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376481A (en) * 2018-08-16 2019-02-22 清能艾科(深圳)能源技术有限公司 Calculation method, device and the computer equipment of digital cores phase percolation curve based on more GPU
CN112949112A (en) * 2021-01-29 2021-06-11 中国石油大学(华东) Rotor-sliding bearing system lubrication basin dynamic grid parallel computing method
CN112992294A (en) * 2021-04-19 2021-06-18 中国空气动力研究与发展中心计算空气动力研究所 Porous medium LBM calculation grid generation method
CN114565658A (en) * 2022-01-14 2022-05-31 武汉理工大学 Pore size calculation method and device based on CT technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孔隙尺度多孔介质流体流动与溶质运移高性能模拟;周鸿翔 等;水科学进展;第31卷(第3期);第422-430页 *
格子Boltzmann方法多GPU并行性能的研究;张纲;王利民;葛蔚;;计算机与应用化学(第10期);第4-13页 *

Also Published As

Publication number Publication date
CN115455794A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
Kerbyson et al. A performance model of the parallel ocean program
Sadayappan et al. Nearest-neighbor mapping of finite element graphs onto processor meshes
Nicol et al. Dynamic remapping of parallel computations with varying resource demands
Levchuk et al. Normative design of organizations. II. Organizational structure
US5694602A (en) Weighted system and method for spatial allocation of a parallel load
Jones et al. Computational results for parallel unstructured mesh computations
Bhagavathi et al. A fast selection algorithm for meshes with multiple broadcasting
CN107977444A (en) Mass data method for parallel processing based on big data
JP2024066980A (en) A method for characterizing three-dimensional fracture network rock mass models with multi-scale heterogeneity
CN115455794B (en) LBM parallel optimization method, device and storage medium based on communication pore division calculation region
HSIEH et al. Evaluation of automatic domain partitioning algorithms for parallel finite element analysis
Vaughan et al. Enabling tractable exploration of the performance of adaptive mesh refinement
CN108108242A (en) Accumulation layer intelligence distribution control method based on big data
CN115016947B (en) Load distribution method, device, equipment and medium
Marshall et al. Performance evaluation and enhancements of a flood simulator application for heterogeneous hpc environments
CN115906684A (en) Hydrodynamics multi-grid solver parallel optimization method for Shenwei architecture
CN108062249A (en) High in the clouds data allocation schedule method based on big data
Ponnusamy et al. Graph contraction for mapping data on parallel computers: A quality–cost tradeoff
Loöhner et al. A load balancing algorithm for unstructured grids
Rantakokko et al. Parallel structured adaptive mesh refinement
Seredyński Scheduling tasks of a parallel program in two-processor systems with use of cellular automata
Marshall et al. Performance improvement of a two-dimensional flood simulation application in hybrid computing environments
CN108268697A (en) A kind of high efficiency electric propulsion plume plasma parallel simulation method
Playne et al. Simulating and Benchmarking the shallow-water fluid dynamical equations on multiple graphical processing units
JP7133843B2 (en) Processing device, processing system, processing method, program, and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant