CN115455794A

CN115455794A - LBM Parallel Optimization Method, Device and Storage Medium Based on Connected Pores to Divide Calculation Area

Info

Publication number: CN115455794A
Application number: CN202210953478.0A
Authority: CN
Inventors: 胡五龙; 许铭扬; 吴卫国; 李凡; 肖一鹤; 蒋张泽
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2022-12-09
Anticipated expiration: 2042-08-10
Also published as: CN115455794B

Abstract

The invention provides a LBM parallel optimization method and device based on a connected pore division calculation area, which can divide the sub-domain number in a balanced manner according to the calculation node number, so that each calculation node can perform load balancing and efficient processing, and the calculation processing efficiency is improved. The method comprises the following steps: step 1, determining the total number N of subdomains to be decomposed according to the number N of calculation nodes in the system; step 2, decomposing the calculation domain: dividing the basin into n along the x-axis _x Regions having the same or similar number of cells, and dividing each region into n along the y-axis _y Sub-regions with the same or similar number of cells, and finally dividing n _x ×n _y Sub-region dividing n along z-axis _z Then, obtaining N groups ofSubdomains with the same or similar unit numbers, wherein each subdomain is a three-dimensional region consisting of a plurality of pore units; and the unit number difference between the sub-domain with the maximum unit number and the sub-domain with the minimum unit number after decomposition should not exceed one thousandth of the total unit number; and 3, distributing a calculation task.

Description

LBM parallel optimization method, device and storage based on connected pores to divide computing area medium

技术领域technical field

本发明属于孔隙结构模拟计算技术领域，具体涉及基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质。The invention belongs to the technical field of pore structure simulation calculation, and in particular relates to an LBM parallel optimization method, device and storage medium for dividing calculation areas based on connected pores.

背景技术Background technique

格子玻尔兹曼方法(Lattice Boltzmann Method简称LBM)是在介观尺度模拟多孔介质孔隙结构中流体流动的有效手段。为了使模拟的结果具有代表性，模拟试样的尺寸应足够大，且由于孔隙结构中流体流动本身的复杂性，在数值模拟时，孔隙尺度LBM往往会面临大量计算资源和存储空间的需求。因而大规模LBM模拟需要进行并行优化，其执行时间和内存需求取决于数据量及并行优化方法。Lattice Boltzmann Method (Lattice Boltzmann Method, referred to as LBM) is an effective means to simulate fluid flow in the pore structure of porous media at the mesoscopic scale. In order to make the simulation results representative, the size of the simulated sample should be large enough, and due to the complexity of the fluid flow in the pore structure, the pore-scale LBM often faces a large number of computing resources and storage space requirements during numerical simulation. Therefore, large-scale LBM simulation needs to be optimized in parallel, and its execution time and memory requirements depend on the amount of data and the parallel optimization method.

目前并行计算的域分解基本思路是将计算域分解为若干子域，然后恢复接口并执行负载平衡。但对于复杂多孔材料，此步骤往往需要花费大量时间反复迭代才能达到可接受的负载平衡。其中，递归二分法(Recursive Bisection Method)是一种最常用的计算域分解方案，它将计算域均匀地分为两子域，然后再在子域上进行同样的划分，最终经过R次递归共划分出2^R个子域。这种划分方案能够很好的平衡均匀介质规则或不规则结构的工作负载，但存在如下三方面问题：1)对非均匀、高度异质化的多孔结构仍难以实现良好的工作负载均衡；2)该方法划分子域数必须为2^R，而处理器(计算节点)数量一般不会正好与之相等，导致部分处理器无法被有效利用；3)当划分的子域较多时，需要重复递归划分的次数R就会较大，效率低。The current basic idea of domain decomposition for parallel computing is to decompose the computing domain into several sub-domains, and then restore the interface and perform load balancing. But for complex porous materials, this step often takes a lot of time and iterative steps to achieve an acceptable load balance. Among them, the recursive bisection method (Recursive Bisection Method) is one of the most commonly used computational domain decomposition schemes, which divides the computational domain into two sub-domains evenly, and then performs the same division on the sub-domains. Finally, after R times of recursive total Divide 2 ^R sub-domains. This partition scheme can well balance the workload of regular or irregular structures in homogeneous media, but there are three problems: 1) It is still difficult to achieve good workload balance for non-uniform and highly heterogeneous porous structures; 2 ) The number of sub-domains divided by this method must be 2 ^R , and the number of processors (computing nodes) is generally not exactly equal to it, resulting in some processors being unable to be effectively used; 3) When there are many sub-domains divided, repeated recursion is required The number of divisions R will be large and the efficiency will be low.

发明内容Contents of the invention

本发明是为了解决上述问题而进行的，目的在于提供一种基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质，能够根据计算节点数量均衡划分子域数量，使各计算节点都能负载均衡高效处理，提高计算处理速度。The present invention is carried out to solve the above problems, and the purpose is to provide an LBM parallel optimization method, device and storage medium based on connected pores to divide the computing area, which can divide the number of sub-domains in a balanced manner according to the number of computing nodes, so that each computing node can Efficient processing of load balancing to improve computing processing speed.

本发明为了实现上述目的，采用了以下方案：In order to achieve the above object, the present invention adopts the following scheme:

<方法><method>

本发明提供了基于连通孔隙划分计算区域的LBM并行优化方法，其特征在于，包括以下步骤：The invention provides an LBM parallel optimization method based on connected pores to divide the calculation area, which is characterized in that it comprises the following steps:

步骤1.根据待模拟多孔介质试样的孔隙分布数据，确定试样的空隙单元信息和连通情况；根据系统中计算节点数量N确定应分解的子域总数N；Step 1. Determine the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determine the total number N of sub-domains that should be decomposed according to the number N of computing nodes in the system;

步骤2.分解计算域：Step 2. Decompose the computational domain:

沿x轴将流域划分为n_x个具有相同或相近单元数的区域，然后沿y轴将每个区域划分为n_y个具有相同或相近单元数的子区域，最后将这n_x×n_y个子区域沿z轴划分n_z次，得到n_x×n_y×n_z＝N个具有相同或相近单元数的子域，每个子域均为多个孔隙单元组成的立体区域；并且，分解后具有最大单元数M_max的子域和具有最小单元数M_min的子域间的单元数差异应不超过总单元数的千分之一；可根据计算域划分精度要求进一步缩小子域间单元数的允许差异值；Divide the watershed into n _x areas with the same or similar number of units along the x-axis, then divide each area into n _y sub-areas with the same or similar number of units along the y-axis, and finally divide this n _x ×n _y The sub-regions are divided n _z times along the z-axis to obtain n _x ×n _y ×n _z =N sub-regions with the same or similar number of units, each sub-region is a three-dimensional region composed of multiple pore units; and, after decomposition The difference in the number of units between the subdomain with the maximum number of units M _max and the subdomain with the minimum number of units M _min should not exceed one thousandth of the total number of units; the number of units between subdomains can be further reduced according to the accuracy requirements of the calculation domain division The allowable difference value;

步骤3.分配计算任务：Step 3. Assign computing tasks:

并行计算时，将N个子域一一分配给N个计算节点处理；计算节点在对子域进行处理时，仅考虑所有连通的孔隙单元，并对这些连通的孔隙单元进行重新编号，随后将它们分别存储在一维数组p_i中，i为子域编号；每个连通的孔隙单元都与一个坐标相关联，并按照一维数组的顺序存储，得到存储有孔隙单元的序数和相应坐标的孔隙单元数组，各孔隙单元都通过它独有的序数和坐标进行追踪；When computing in parallel, assign N subdomains to N computing nodes one by one; when computing subdomains, computing nodes only consider all connected pore units, and renumber these connected pore units, and then They are respectively stored in the one-dimensional array p _i , where i is the number of the subfield; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the pores that store the ordinal number of the pore unit and the corresponding coordinates are obtained An array of cells, each pore cell is tracked by its unique ordinal number and coordinates;

对于每个孔隙单元：将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组，得到一系列第一衍生数组；将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组，得到一系列第二衍生数组。For each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the momentum and local fluid density data of the pore unit Corresponding to the ordinal number of the pore unit, it is stored as a one-dimensional array to obtain a series of second derived arrays.

具体地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法，还可以具有以下特征：在步骤2中，最大单元数M_max：Specifically, the LBM parallel optimization method based on connected pores to divide the calculation area provided by the present invention may also have the following characteristics: in step 2, the maximum number of units M _max :

最小单元数M_min：Minimum number of units M _min :

式中，NX、NY、NZ分别为模拟区域x、y、z轴上包含孔隙与固体的总单元数。In the formula, NX, NY, and NZ are the total number of units containing pores and solids on the x, y, and z axes of the simulation area, respectively.

优选地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法，还可以具有以下特征：在步骤3中，每个孔隙单元的主要数据结构是其流体粒子分布函数和平衡流体粒子分布函数；将该孔隙单元计算前后的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组，得到四个第二衍生数组。Preferably, the LBM parallel optimization method based on connected pores to divide the calculation area provided by the present invention can also have the following characteristics: in step 3, the main data structure of each pore unit is its fluid particle distribution function and equilibrium fluid particle distribution function ; The momentum and local fluid density data before and after the calculation of the pore unit and the ordinal number of the pore unit are correspondingly stored as a one-dimensional array, and four second derived arrays are obtained.

优选地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法，还可以包括：步骤4.设定子域间通信方式：在处理不同子域间的相邻或对角线孔隙单元时，将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元，使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。Preferably, the LBM parallel optimization method based on connected pores to divide the calculation area provided by the present invention may also include: Step 4. Setting the communication mode between sub-domains: when dealing with adjacent or diagonal pore units between different sub-domains , extend the interface layer on the interface of the current subdomain with an additional layer of units to cover adjacent and diagonal pore units, so that the communication mode of these expanded pore units is consistent with that of the interface units in the current subdomain.

<装置><device>

进一步，本发明还提供了一种自动实现上述<方法>的装置，其特征在于，包括：Further, the present invention also provides a device for automatically implementing the above <method>, characterized in that it includes:

确定部，根据待模拟多孔介质试样的孔隙分布数据，确定试样的空隙单元信息和连通情况；根据系统中计算节点数量N确定应分解的子域总数N；The determining part determines the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determines the total number of sub-domains N to be decomposed according to the number N of computing nodes in the system;

计算域分解部，沿x轴将流域划分为n_x个具有相同或相近单元数的区域，然后沿y轴将每个区域划分为n_y个具有相同或相近单元数的子区域，最后将这n_x×n_y个子区域沿z轴划分n_z次，得到n_x×n_y×n_z＝N个具有相同或相近单元数的子域；分解后的子域应满足条件：具有最大单元数M_max的子域和具有最小单元数M_min的子域间的单元数差异应不超过总单元数的千分之一；Calculate the domain decomposition part, divide the watershed into n _x regions with the same or similar number of units along the x-axis, then divide each region into n _y sub-regions with the same or similar number of units along the y-axis, and finally divide this n _x ×n _y sub-regions are divided n _z times along the z-axis to obtain n _x ×n _y ×n _z =N sub-domains with the same or similar number of units; the decomposed sub-domains should meet the condition: have the largest number of units The difference in the number of units between the subdomain of M _max and the subdomain with the minimum number of units M _min shall not exceed one thousandth of the total number of units;

计算任务分配部，将N个子域一一分配给N个计算节点处理，进行并行计算；计算节点在对子域进行处理时，仅考虑所有连通的孔隙单元，并对这些连通的孔隙单元进行重新编号，随后将它们分别存储在一维数组p_i中，i为子域编号；每个连通的孔隙单元都与一个坐标相关联，并按照一维数组的顺序存储，得到存储有孔隙单元的序数和相应坐标的孔隙单元数组；对于每个孔隙单元：将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组，得到一系列第一衍生数组；将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组，得到一系列第二衍生数组；The calculation task allocation department assigns N sub-domains to N computing nodes one by one for parallel computing; when the computing nodes process sub-domains, they only consider all connected pore units, and recalculate these connected pore units. number, and then store them in the one-dimensional array p _i respectively, where i is the subfield number; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the ordinal number of the stored pore unit is obtained and the pore unit array of the corresponding coordinates; for each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the pore unit The momentum and local fluid density data of the unit are stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of second derived arrays are obtained;

控制部，与确定部、计算域分解部和计算任务分配部均通信相连，控制它们的运行。The control part communicates with the determination part, the calculation domain decomposition part and the calculation task distribution part to control their operation.

优选地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置，还可以具有这样的特征：计算域分解部中，最大单元数M_max：Preferably, the LBM parallel optimization device for dividing the calculation area based on connected pores provided by the present invention may also have such a feature: in the calculation domain decomposition part, the maximum number of units M _max is:

最小单元数M_min：Minimum number of units M _min :

优选地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置，还可以包括：通信方式设定部，与控制部通信相连，在处理不同子域间的相邻或对角线孔隙单元时，将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元，使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。Preferably, the LBM parallel optimization device based on connected pores to divide the calculation area provided by the present invention may also include: a communication mode setting part, connected to the control part in communication, and processing adjacent or diagonal pore units between different subdomains When , the interface layer on the interface of the current subdomain is extended with an additional layer of units to cover the adjacent and diagonal pore units, so that the communication mode of these expanded pore units is consistent with that of the interface units in the current subdomain.

优选地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置，还可以包括：输入显示部，与确定部、计算域分解部、计算任务分配部、通信方式设定部以及控制部均通信相连，用于让用户输入操作指令，并进行相应显示。Preferably, the LBM parallel optimization device for dividing the calculation area based on connected pores provided by the present invention may further include: an input display unit, together with a determination unit, a calculation domain decomposition unit, a calculation task allocation unit, a communication mode setting unit and a control unit It is connected by communication, and is used for allowing the user to input operation instructions and display them accordingly.

优选地，本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置，还可以具有这样的特征：输入显示部能够根据操作指令对确定部确定的试样空隙单元信息和连通情况以及计算节点数量或子域总数N进行显示，对计算域分解部的分解情况进行显示，对计算任务分配部的分配情况进行相应显示。Preferably, the LBM parallel optimization device based on connected pores to divide the calculation area provided by the present invention may also have such a feature: the input display unit can determine the sample pore unit information and connectivity and the number of calculation nodes determined by the determination unit according to the operation instruction Or the total number N of sub-domains is displayed, the decomposition of the computing domain decomposition part is displayed, and the distribution of the computing task distribution part is displayed accordingly.

<存储介质><storage medium>

另外，本发明还提供了存储有用于实现上述<方法>程序的计算机可读存储介质。In addition, the present invention also provides a computer-readable storage medium storing a program for realizing the above <method>.

发明的作用与效果Function and Effect of Invention

本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质，能够根据计算节点数量随意均衡划分子域总数，子域总数不限于2^R个，而且本发明能够有效降低因划分界面的曲折性而造成的通信复杂度，在流域分解过程中就平衡所有子域的工作负载，无需进行迭代等二次优化，减小了系统内存消耗，简化了计算任务分配过程，并且本发明使子域之间的通信是高效且易于实现的，减少了通讯时间；因此，本发明能够显著提高计算效率，特别适用于多孔材料孔隙尺度模拟计算量大、内存消耗大的处理，开拓了新的计算区域分解与处理思路，具有较大的推广价值。The LBM parallel optimization method, device and storage medium based on connected pores to divide the computing area provided by the present invention can randomly and evenly divide the total number of sub-domains according to the number of computing nodes. The total number of sub-domains is not limited to 2 ^R , and the present invention can effectively reduce The communication complexity caused by the tortuousness of the interface balances the workload of all sub-domains in the process of water domain decomposition, without the need for secondary optimization such as iteration, reduces system memory consumption, and simplifies the calculation task allocation process, and the present invention The communication between the sub-domains is efficient and easy to implement, and the communication time is reduced; therefore, the present invention can significantly improve the calculation efficiency, and is especially suitable for the processing of pore-scale simulation of porous materials with a large amount of calculation and large memory consumption, and opens up new possibilities. The calculation area decomposition and processing ideas of this paper have great promotion value.

附图说明Description of drawings

图1为本发明实施例中计算域分解过程示意图，其中(a)为沿x轴分解，(b)为沿y轴分解，(c)为沿z轴分解；Fig. 1 is a schematic diagram of the decomposition process of the computational domain in an embodiment of the present invention, wherein (a) is decomposed along the x-axis, (b) is decomposed along the y-axis, and (c) is decomposed along the z-axis;

图2为本发明实施例中分解得到子域后用一维数组标记多孔介质内的流体单元的过程示意图；Fig. 2 is a schematic diagram of the process of marking the fluid units in the porous medium with a one-dimensional array after decomposing and obtaining the sub-domains in the embodiment of the present invention;

图3为本发明实施例中0^#子域与1^#子域的界面示意图(白色为孔隙单元，灰色为固体单元)；3 is a schematic diagram of the interface between the 0 ^# subdomain and the 1 ^# subdomain in the embodiment of the present invention (white is a pore unit, and gray is a solid unit);

图4为本发明实施例涉及的D3Q19离散速度模型图；Fig. 4 is a D3Q19 discrete speed model diagram related to an embodiment of the present invention;

图5为本发明实施例涉及的不同子域间对角线单元的信息交流过程示意图。FIG. 5 is a schematic diagram of an information exchange process of diagonal units between different sub-domains according to an embodiment of the present invention.

具体实施方式detailed description

以下结合附图对本发明涉及的基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质进行详细地说明。The LBM parallel optimization method, device and storage medium based on connected pore division calculation area involved in the present invention will be described in detail below with reference to the accompanying drawings.

<实施例><Example>

本实施例所提供的基于连通孔隙划分计算区域的LBM并行优化方法包括以下步骤：The LBM parallel optimization method based on connected pores to divide the calculation area provided in this embodiment includes the following steps:

步骤1.根据待模拟多孔介质试样的孔隙分布数据，确定试样的空隙单元信息和连通情况；根据系统中计算节点数量N确定应分解的子域总数N；对于均匀，结构分布简单的多孔材料，孔隙分布数据可直接由编程语言定义，对于非均匀、高度异质化的多孔结构可选用X-CT技术得到孔隙分布数据。Step 1. Determine the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determine the total number of sub-domains N to be decomposed according to the number of calculation nodes N in the system; Materials and pore distribution data can be directly defined by programming language, and X-CT technology can be used to obtain pore distribution data for non-uniform and highly heterogeneous porous structures.

步骤2.分解计算域：Step 2. Decompose the computational domain:

沿x轴将多孔介质试样流域划分为n_x个具有相同或相近单元数的区域，然后沿y轴将每个区域划分为n_y个具有相同或相近单元数的子区域，最后将这n_x×n_y个子区域沿z轴划分n_z次，得到n_x×n_y×n_z＝N个具有相同或相近单元数的子域；并且，分解后具有最大单元数M_max的子域和具有最小单元数M_min的子域间的单元数差异应不超过总单元数的千分之一。Divide the watershed of the porous media sample along the x-axis into n _x regions with the same or similar number of units, then divide each region into n _y sub-regions with the same or similar number of units along the y-axis, and finally divide the n The _x ×n _y sub-regions are divided n _z times along the z-axis to obtain n _x ×n _y ×n _z =N sub-regions with the same or similar number of units; and, after decomposition, the sub-domain with the largest number of units M _max and The difference in the number of units between subdomains with the minimum number of units M _min should not exceed one thousandth of the total number of units.

最大单元数M_max：Maximum number of units M _max :

最小单元数M_min：Minimum number of units M _min :

具体地，如图1所示，首先沿z，y和x方向连续扫描，遍历计算区域内的每个单元。应注意z→y→x的扫描顺序就是在x_i截面上按先z后y的顺序遍历。图1(b)为在图1(a)的基础上，将每个子域沿y轴划分成了2个小的区域，具体为按x→z→y的顺序对子域的孔隙单元遍历编号并存入一维数组，然后将每个子域所关联的一维数组分为两部分，此时流域被分成了3×2×1个子区域。图1(c)在之前的划分基础上，用同样的方法按x→y→z的扫描顺序遍历每个子域的孔隙单元并重复划分过程，最终模拟区域被划分成3×2×2个子域。接着，如图2所示，筛选并将连通孔隙单元编号并存储在一个一维数组中，流域内共有N个孔隙单元。如图3所示，所有连通的孔隙单元被划分为多个大小相等的组，每个组与一个子域相关联，总组数等于分解后子域数。然后，将一维数组分成三小组，每小组包含N/3或N/3+1个孔隙单元，三个小组的最后一个单元编号分别为N₀、N₁和N。最后，恢复两个相邻子域间的接口，如图3和图1(a)所示，接口中包含每个分区的最后一个孔隙单元。Specifically, as shown in Fig. 1, firstly, scan continuously along the z, y and x directions, traversing each unit in the calculation area. It should be noted that the scanning order of z→y→x is to traverse in the order of z first and then y on the x _i section. Figure 1(b) is based on Figure 1(a), each sub-domain is divided into two small areas along the y-axis, specifically, the number of pore units in the sub-domain is traversed in the order of x→z→y And store it into a one-dimensional array, and then divide the one-dimensional array associated with each sub-domain into two parts, and at this time the watershed is divided into 3×2×1 sub-regions. Figure 1(c) On the basis of the previous division, use the same method to traverse the pore units of each subdomain in the scanning order of x→y→z and repeat the division process. Finally, the simulation area is divided into 3×2×2 subdomains . Next, as shown in Figure 2, the connected pore units are screened and numbered and stored in a one-dimensional array, and there are N pore units in the watershed. As shown in Fig. 3, all connected pore units are divided into multiple groups of equal size, each group is associated with a subdomain, and the total number of groups is equal to the number of subdomains after decomposition. Then, divide the one-dimensional array into three groups, each group contains N/3 or N/3+1 pore units, and the last unit numbers of the three groups are N ₀ , N ₁ and N respectively. Finally, the interface between two adjacent subdomains is recovered, as shown in Fig. 3 and Fig. 1(a), the interface contains the last pore unit of each partition.

计算域分解方案将土样直接划分成了n_x×n_y×n_z个子域，子域之间的工作负载之差就是每个子域间相邻界面上孔隙单元数的差异。后续模拟皆基于划分后的一维数组，模拟区域从1到nx_i，nx_i是子域在与处理器i相关联的x方向(流体流动方向)上的尺寸。The computational domain decomposition scheme directly divides the soil sample into n _x × _ny ×n _z sub-domains, and the difference in workload between sub-domains is the difference in the number of pore units on the adjacent interface between each sub-domain. Subsequent simulations are all based on the divided one-dimensional array, and the simulation area is from 1 to _nxi , where _nxi is the size of the subdomain in the x direction (fluid flow direction) associated with processor i.

步骤3.分配计算任务：Step 3. Assign computing tasks:

并行计算时，将N个子域一一分配给N个计算节点处理；计算节点在对子域进行处理时，仅考虑所有连通的孔隙单元，并对这些连通的孔隙单元进行重新编号，随后将它们分别存储在一维数组p_i中，i为子域编号；每个连通的孔隙单元都与一个坐标相关联，并按照一维数组的顺序存储，得到存储有孔隙单元的序数和相应坐标的孔隙单元数组，各孔隙单元都通过它独有的序数和坐标进行追踪。When computing in parallel, assign N subdomains to N computing nodes one by one; when computing subdomains, computing nodes only consider all connected pore units, and renumber these connected pore units, and then They are respectively stored in the one-dimensional array p _i , where i is the number of the subfield; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the pores that store the ordinal number of the pore unit and the corresponding coordinates are obtained An array of cells, each pore cell is tracked by its unique ordinal number and coordinates.

每个孔隙单元的主要数据结构是其流体粒子分布函数和平衡流体粒子分布函数；将该孔隙单元计算前后的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组，得到四个第二衍生数组。The main data structure of each pore unit is its fluid particle distribution function and equilibrium fluid particle distribution function; the momentum and local fluid density data before and after the calculation of the pore unit are stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and four The second derived array.

如图4所示，根据LBM中的D3Q19模型，对每个连通孔隙单元x，都有至多18个相邻孔隙单元存储在18个一维数组中。每个数组与一个分量相对应。每个孔隙单元的主要数据结构是其19个流体粒子分布函数和18个平衡流体粒子分布函数，每个函数均由一个一维数组描述，另外的四个一维数组用于存储孔隙单元中的动量和局部流体密度。As shown in Fig. 4, according to the D3Q19 model in LBM, for each connected pore unit x, there are at most 18 adjacent pore units stored in 18 one-dimensional arrays. Each array corresponds to a component. The main data structure of each pore unit is its 19 fluid particle distribution functions and 18 equilibrium fluid particle distribution functions, each function is described by a one-dimensional array, and the other four one-dimensional arrays are used to store the Momentum and local fluid density.

步骤4.设定子域间通信(计算数据的提取与调用)方式：如图5所示，在处理不同子域间的相邻或对角线孔隙单元(图中的d和d′)时，将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元，使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。Step 4. Set the communication between sub-domains (extraction and transfer of calculation data): as shown in Figure 5, when dealing with adjacent or diagonal pore units (d and d' in the figure) between different sub-domains , extend the interface layer on the interface of the current subdomain with an additional layer of units to cover adjacent and diagonal pore units, so that the communication mode of these expanded pore units is consistent with that of the interface units in the current subdomain.

在本实施的每个子域中，重叠区域存储在18个一维数组中，并没有进行多余或重复的计算。每个子域从1到nx_i的流体单元的粒子分布函数和平衡态分布函数存储在两组不同的一维数组中(每组18个一维数组)。选用指针来定位接口的第一个孔隙单元，并且仅将接口层中的孔隙单元数据交换到相邻分区。由于仅存储了每个子域中包括接口层在内的所有孔隙单元，并且每个孔隙单元的相邻单元存储在18个一维数组中，因此每个孔隙单元的相邻单元在模拟中均能轻松的确定。每次迭代，在各个处理器上各自计算对应子域的流体粒子分布函数，然后再处理器之间交换子域接口界面的数据。在数据交换中，仅需要将接口层上流体单元18个分布函数分量中的5个传递给相邻处理器即可。例如，在x方向上，仅需要将界面nx_i上的流体粒子分布函数f₃[i]、f₈[i]、f₉[i]、f₁₂[i]和f₁₃[i]从右边的子域传递左边的子域，同样也仅需要将f₁[i]、f₇[i]、f₁₀[i]、f₁₁[i]和f₁₄[i]从左边的子域传递给右边的子域。In each subfield of this implementation, the overlapping regions are stored in 18 one-dimensional arrays, and no redundant or repeated calculations are performed. The particle distribution function and equilibrium state distribution function of fluid cells from 1 to _nxi in each subdomain are stored in two different sets of one-dimensional arrays (18 one-dimensional arrays in each set). The pointer is chosen to locate the first porosity unit of the interface, and only the porosity unit data in the interface layer is exchanged to the adjacent partition. Since only all pore units including the interface layer in each subdomain are stored, and the adjacent units of each pore unit are stored in 18 one-dimensional arrays, the adjacent units of each pore unit can be Easy OK. In each iteration, the fluid particle distribution function of the corresponding sub-domain is calculated on each processor, and then the data of the sub-domain interface is exchanged between the processors. In data exchange, only 5 of the 18 distribution function components of the fluid unit on the interface layer need to be transferred to the adjacent processors. For example, in the x direction, only the fluid particle distribution functions f ₃ [i], f ₈ [i], f ₉ [i], f ₁₂ [i] and f ₁₃ [i] on the interface nx _i need to be moved from the right The subfield of the left subfield is passed to the left subfield, and only f ₁ [i], f ₇ [i], f ₁₀ [i], f ₁₁ [i], and f ₁₄ [i] need to be passed from the left subfield to right subdomain.

通过以上方法，本发明能够根据计算节点数量随意均衡划分子域总数，充分利用所有计算节点进行计算处理，而且能够有效降低因划分界面的曲折性而造成的通信复杂度，在流域分解过程中就平衡所有子域的工作负载，无需进行迭代等二次优化，减小了系统内存消耗，简化了计算任务分配过程，并且有效减小子域之间的通讯时间，从而本发明能够显著提高计算效率，数据量越大，用本发明方法的优势就越明显，因此，特别适用于多孔材料孔隙尺度模拟计算量大、内存消耗大的处理。Through the above method, the present invention can arbitrarily and evenly divide the total number of sub-domains according to the number of computing nodes, make full use of all computing nodes for computing and processing, and can effectively reduce the communication complexity caused by the tortuousness of the dividing interface. Balance the workload of all sub-domains without secondary optimization such as iteration, reduce system memory consumption, simplify the calculation task allocation process, and effectively reduce the communication time between sub-domains, so that the present invention can significantly improve computing efficiency , the larger the amount of data, the more obvious the advantages of the method of the present invention, therefore, it is especially suitable for the processing of pore-scale simulation of porous materials with a large amount of calculation and large memory consumption.

进一步，本实施例还提供能够自动实现上述方法的装置，该装置包括确定部、计算域分解部、计算任务分配部、通信方式设定部、输入显示部以及控制部。Furthermore, this embodiment also provides a device capable of automatically implementing the above method, and the device includes a determination unit, a calculation domain decomposition unit, a calculation task allocation unit, a communication mode setting unit, an input display unit, and a control unit.

确定部根据待模拟多孔介质试样的孔隙分布数据，确定试样的空隙单元信息和连通情况；根据系统中计算节点数量N确定应分解的子域总数N。The determining part determines the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determines the total number N of subdomains to be decomposed according to the number N of computing nodes in the system.

计算域分解部沿x轴将多孔介质试样流域划分为n_x个具有相同或相近单元数的区域，然后沿y轴将每个区域划分为n_y个具有相同或相近单元数的子区域，最后将这n_x×n_y个子区域沿z轴划分n_z次，得到n_x×n_y×n_z＝N个具有相同或相近单元数的子域；分解后的子域应满足条件：具有最大单元数M_max的子域和具有最小单元数M_min的子域间的单元数差异应不超过总单元数的千分之一。The computational domain decomposition part divides the porous medium sample flow domain into n _x regions with the same or similar number of units along the x-axis, and then divides each region into n _y sub-regions with the same or similar number of units along the y-axis, Finally, divide the n _x ×n _y sub-regions along the z-axis n _z times to obtain n _x ×n _y ×n _z =N sub-regions with the same or similar number of units; the decomposed sub-regions should meet the conditions: have The difference in the number of units between the subdomain with the maximum number of units M _max and the subdomain with the minimum number of units M _min should not exceed one thousandth of the total number of units.

最大单元数M_max：Maximum number of units M _max :

最小单元数M_min：Minimum number of units M _min :

计算任务分配部将N个子域一一分配给N个计算节点处理，进行并行计算；计算节点在对子域进行处理时，仅考虑所有连通的孔隙单元，并对这些连通的孔隙单元进行重新编号，随后将它们分别存储在一维数组p_i中，i为子域编号；每个连通的孔隙单元都与一个坐标相关联，并按照一维数组的顺序存储，得到存储有孔隙单元的序数和相应坐标的孔隙单元数组；对于每个孔隙单元：将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组，得到一系列第一衍生数组；将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组，得到一系列第二衍生数组。The computing task allocation department assigns N sub-domains to N computing nodes one by one for parallel computing; when the computing nodes process the sub-domains, only all connected pore units are considered, and these connected pore units are renumbered , and then store them in the one-dimensional array p _i respectively, i is the subfield number; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the ordinal number and The pore unit array of the corresponding coordinates; for each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the pore unit The momentum and local fluid density data are stored as one-dimensional arrays corresponding to the ordinal numbers of the pore cells, resulting in a series of second derived arrays.

通信方式设定部与控制部通信相连，在处理不同子域间的相邻或对角线孔隙单元时，将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元，使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。The communication mode setting part is connected with the control part by communication. When dealing with adjacent or diagonal pore units between different subdomains, the interface layer on the interface of the current subdomain is extended by an additional layer of units to cover adjacent and diagonal pore units. Aperture units, so that the communication mode of these expanded aperture units is consistent with that of the interface units in the current sub-domain.

输入显示部用于让用户输入操作指令，并进行相应显示。例如，输入显示部能够根据操作指令对确定部确定的试样空隙单元信息和连通情况以及计算节点数量或子域总数N进行显示，对计算域分解部的分解情况进行显示，对计算任务分配部的分配情况进行相应显示。The input display part is used for allowing the user to input operation instructions and display them accordingly. For example, the input display unit can display the sample void unit information and connectivity determined by the determination unit and the number of computing nodes or the total number of sub-domains N according to the operation instructions, display the decomposition of the calculation domain decomposition unit, and display the calculation task allocation unit The allocations are displayed accordingly.

控制部与确定部、计算域分解部和计算任务分配部、通信方式设定部均通信相连，控制它们的运行。The control unit communicates with the determination unit, the calculation domain decomposition unit, the calculation task distribution unit and the communication mode setting unit to control their operation.

以上实施例仅仅是对本发明技术方案所做的举例说明。本发明所涉及的基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质并不仅仅限定于在以上实施例中所描述的内容，而是以权利要求所限定的范围为准。本发明所属领域技术人员在该实施例的基础上所做的任何修改或补充或等效替换，都在本发明的权利要求所要求保护的范围内。The above embodiments are merely illustrations for the technical solution of the present invention. The LBM parallel optimization method, device and storage medium based on connected pores to divide the calculation area involved in the present invention are not limited to the content described in the above embodiments, but are subject to the scope defined in the claims. Any modifications, supplements or equivalent replacements made by those skilled in the art of the present invention on the basis of the embodiments are within the protection scope of the claims of the present invention.

Claims

1. The LBM parallel optimization method based on connected pore division calculation area, is characterized in that, comprises the following steps:

Step 1. Determine the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determine the total number N of sub-domains that should be decomposed according to the number N of computing nodes in the system;

Step 2. Decompose the computational domain:

Divide the watershed of the porous media sample along the x-axis into n _x regions with the same or similar number of units, then divide each region into n _y sub-regions with the same or similar number of units along the y-axis, and finally divide the n The _x ×n _y sub-regions are divided n _z times along the z-axis to obtain n _x ×n _y ×n _z =N sub-regions with the same or similar number of units; and, after decomposition, the sub-domain with the largest number of units M _max and The difference in the number of units between subdomains with the minimum number of units M _min should not exceed one thousandth of the total number of units;

Step 3. Assign computing tasks:

When computing in parallel, assign N subdomains to N computing nodes one by one; when computing subdomains, computing nodes only consider all connected pore units, and renumber these connected pore units, and then They are respectively stored in the one-dimensional array p _i , where i is the number of the subfield; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the pores that store the ordinal number of the pore unit and the corresponding coordinates are obtained An array of cells, each pore cell is tracked by its unique ordinal number and coordinates;

For each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the momentum and local fluid of the pore unit The density data and the ordinal number of the pore unit are correspondingly stored as a one-dimensional array to obtain a series of second derived arrays.

2. the LBM parallel optimization method based on connected pore division calculation area according to claim 1, is characterized in that:

Among them, in step 2, the maximum number of units M _max :

Minimum number of units M _min :

In the formula, NX, NY, and NZ are the total number of units containing pores and solids on the x, y, and z axes of the simulation area, respectively.

3. the LBM parallel optimization method based on connected pore division calculation area according to claim 1, is characterized in that:

Wherein, in step 3, the main data structure of each pore unit is its fluid particle distribution function and equilibrium fluid particle distribution function; the momentum and local fluid density data before and after the calculation of the pore unit correspond to the ordinal number of the pore unit Stored as a one-dimensional array, resulting in four second derived arrays.

4. the LBM parallel optimization method based on connected pore division calculation area according to claim 1, is characterized in that, also comprises:

Step 4. Set the communication mode between subdomains: when dealing with adjacent or diagonal pore units between different subdomains, extend the interface layer on the current subdomain interface with an additional layer of units to cover adjacent and diagonal pore units Aperture units, so that the communication mode of these expanded aperture units is consistent with that of the interface units in the current sub-domain.

5. The LBM parallel optimization device based on connected pores to divide the calculation area, is characterized in that, comprising:

The determining part determines the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determines the total number of sub-domains N to be decomposed according to the number N of computing nodes in the system;

Computational domain decomposition part, divide the porous media sample flow domain into n _x regions with the same or similar number of units along the x-axis, and then divide each region into n _y sub-regions with the same or similar number of units along the y-axis , and finally divide the n _x ×n _y sub-regions along the z-axis n _z times to obtain n _x ×n _y ×n _z =N sub-regions with the same or similar number of units; the decomposed sub-regions should meet the conditions: The difference in the number of units between the subdomain with the maximum number of units M _max and the subdomain with the minimum number of units M _min should not exceed one thousandth of the total number of units;

The calculation task allocation department assigns N sub-domains to N computing nodes one by one for parallel computing; when the computing nodes process sub-domains, they only consider all connected pore units, and recalculate these connected pore units. number, and then store them in the one-dimensional array p _i respectively, where i is the subfield number; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the ordinal number of the stored pore unit is obtained and the pore unit array of the corresponding coordinates; for each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derived arrays are obtained; The momentum and local fluid density data of the pore unit are stored as a one-dimensional array corresponding to the ordinal number of the pore unit to obtain a series of second derived arrays;

The control unit is communicatively connected with the determination unit, the calculation domain decomposition unit and the calculation task allocation unit, and controls their operations.

6. the LBM parallel optimization device based on connected pore division calculation area according to claim 5, characterized in that:

Among them, in the calculation domain decomposition part, the maximum number of units M _max :

Minimum number of units M _min :

7. The LBM parallel optimization device based on connected pore division calculation area according to claim 5, is characterized in that, also comprises:

The communication mode setting unit is connected to the control unit in communication, and when processing adjacent or diagonal pore units between different subdomains, an additional layer of units is extended to the interface layer on the interface of the current subdomain to cover adjacent and diagonal pore units. The diagonal aperture units make the communication mode of these expanded aperture units consistent with the interface units in the current sub-domain.

8. The LBM parallel optimization device based on connected pore division calculation area according to claim 5, further comprising:

The input and display unit is communicatively connected to the determination unit, the calculation domain decomposition unit, the calculation task allocation unit, the communication mode setting unit, and the control unit, and is used to allow the user to input operation instructions and perform are displayed accordingly.

9. The LBM parallel optimization device based on connected pore division calculation area according to claim 5, characterized in that:

Wherein, the input display unit can display the sample void unit information and connection status determined by the determination unit and the number of calculation nodes or the total number N of sub-domains determined by the determination unit according to the operation instruction, and display the decomposition status of the calculation domain decomposition unit , and correspondingly display the distribution status of the computing task distribution unit.

10. A storage medium, characterized in that:

A program for realizing the LBM parallel optimization method for dividing calculation regions based on connected pores according to any one of claims 1 to 4 is stored.