CN115455794A - LBM Parallel Optimization Method, Device and Storage Medium Based on Connected Pores to Divide Calculation Area - Google Patents
LBM Parallel Optimization Method, Device and Storage Medium Based on Connected Pores to Divide Calculation Area Download PDFInfo
- Publication number
- CN115455794A CN115455794A CN202210953478.0A CN202210953478A CN115455794A CN 115455794 A CN115455794 A CN 115455794A CN 202210953478 A CN202210953478 A CN 202210953478A CN 115455794 A CN115455794 A CN 115455794A
- Authority
- CN
- China
- Prior art keywords
- units
- pore
- unit
- sub
- calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011148 porous material Substances 0.000 title claims abstract description 152
- 238000004364 calculation method Methods 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000005457 optimization Methods 0.000 title claims abstract description 29
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 10
- 108020001568 subdomains Proteins 0.000 claims description 38
- 239000012530 fluid Substances 0.000 claims description 26
- 238000004891 communication Methods 0.000 claims description 25
- 238000003491 array Methods 0.000 claims description 21
- 238000004088 simulation Methods 0.000 claims description 15
- 238000005315 distribution function Methods 0.000 claims description 13
- 239000002245 particle Substances 0.000 claims description 11
- 239000007787 solid Substances 0.000 claims description 7
- 210000004027 cell Anatomy 0.000 claims description 4
- 210000003429 pore cell Anatomy 0.000 claims description 4
- 239000011800 void material Substances 0.000 claims description 2
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000005192 partition Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/25—Design optimisation, verification or simulation using particle-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/10—Numerical modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/08—Fluids
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明属于孔隙结构模拟计算技术领域,具体涉及基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质。The invention belongs to the technical field of pore structure simulation calculation, and in particular relates to an LBM parallel optimization method, device and storage medium for dividing calculation areas based on connected pores.
背景技术Background technique
格子玻尔兹曼方法(Lattice Boltzmann Method简称LBM)是在介观尺度模拟多孔介质孔隙结构中流体流动的有效手段。为了使模拟的结果具有代表性,模拟试样的尺寸应足够大,且由于孔隙结构中流体流动本身的复杂性,在数值模拟时,孔隙尺度LBM往往会面临大量计算资源和存储空间的需求。因而大规模LBM模拟需要进行并行优化,其执行时间和内存需求取决于数据量及并行优化方法。Lattice Boltzmann Method (Lattice Boltzmann Method, referred to as LBM) is an effective means to simulate fluid flow in the pore structure of porous media at the mesoscopic scale. In order to make the simulation results representative, the size of the simulated sample should be large enough, and due to the complexity of the fluid flow in the pore structure, the pore-scale LBM often faces a large number of computing resources and storage space requirements during numerical simulation. Therefore, large-scale LBM simulation needs to be optimized in parallel, and its execution time and memory requirements depend on the amount of data and the parallel optimization method.
目前并行计算的域分解基本思路是将计算域分解为若干子域,然后恢复接口并执行负载平衡。但对于复杂多孔材料,此步骤往往需要花费大量时间反复迭代才能达到可接受的负载平衡。其中,递归二分法(Recursive Bisection Method)是一种最常用的计算域分解方案,它将计算域均匀地分为两子域,然后再在子域上进行同样的划分,最终经过R次递归共划分出2R个子域。这种划分方案能够很好的平衡均匀介质规则或不规则结构的工作负载,但存在如下三方面问题:1)对非均匀、高度异质化的多孔结构仍难以实现良好的工作负载均衡;2)该方法划分子域数必须为2R,而处理器(计算节点)数量一般不会正好与之相等,导致部分处理器无法被有效利用;3)当划分的子域较多时,需要重复递归划分的次数R就会较大,效率低。The current basic idea of domain decomposition for parallel computing is to decompose the computing domain into several sub-domains, and then restore the interface and perform load balancing. But for complex porous materials, this step often takes a lot of time and iterative steps to achieve an acceptable load balance. Among them, the recursive bisection method (Recursive Bisection Method) is one of the most commonly used computational domain decomposition schemes, which divides the computational domain into two sub-domains evenly, and then performs the same division on the sub-domains. Finally, after R times of recursive total Divide 2 R sub-domains. This partition scheme can well balance the workload of regular or irregular structures in homogeneous media, but there are three problems: 1) It is still difficult to achieve good workload balance for non-uniform and highly heterogeneous porous structures; 2 ) The number of sub-domains divided by this method must be 2 R , and the number of processors (computing nodes) is generally not exactly equal to it, resulting in some processors being unable to be effectively used; 3) When there are many sub-domains divided, repeated recursion is required The number of divisions R will be large and the efficiency will be low.
发明内容Contents of the invention
本发明是为了解决上述问题而进行的,目的在于提供一种基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质,能够根据计算节点数量均衡划分子域数量,使各计算节点都能负载均衡高效处理,提高计算处理速度。The present invention is carried out to solve the above problems, and the purpose is to provide an LBM parallel optimization method, device and storage medium based on connected pores to divide the computing area, which can divide the number of sub-domains in a balanced manner according to the number of computing nodes, so that each computing node can Efficient processing of load balancing to improve computing processing speed.
本发明为了实现上述目的,采用了以下方案:In order to achieve the above object, the present invention adopts the following scheme:
<方法><method>
本发明提供了基于连通孔隙划分计算区域的LBM并行优化方法,其特征在于,包括以下步骤:The invention provides an LBM parallel optimization method based on connected pores to divide the calculation area, which is characterized in that it comprises the following steps:
步骤1.根据待模拟多孔介质试样的孔隙分布数据,确定试样的空隙单元信息和连通情况;根据系统中计算节点数量N确定应分解的子域总数N;
步骤2.分解计算域:
沿x轴将流域划分为nx个具有相同或相近单元数的区域,然后沿y轴将每个区域划分为ny个具有相同或相近单元数的子区域,最后将这nx×ny个子区域沿z轴划分nz次,得到nx×ny×nz=N个具有相同或相近单元数的子域,每个子域均为多个孔隙单元组成的立体区域;并且,分解后具有最大单元数Mmax的子域和具有最小单元数Mmin的子域间的单元数差异应不超过总单元数的千分之一;可根据计算域划分精度要求进一步缩小子域间单元数的允许差异值;Divide the watershed into n x areas with the same or similar number of units along the x-axis, then divide each area into n y sub-areas with the same or similar number of units along the y-axis, and finally divide this n x ×n y The sub-regions are divided n z times along the z-axis to obtain n x ×n y ×n z =N sub-regions with the same or similar number of units, each sub-region is a three-dimensional region composed of multiple pore units; and, after decomposition The difference in the number of units between the subdomain with the maximum number of units M max and the subdomain with the minimum number of units M min should not exceed one thousandth of the total number of units; the number of units between subdomains can be further reduced according to the accuracy requirements of the calculation domain division The allowable difference value;
步骤3.分配计算任务:
并行计算时,将N个子域一一分配给N个计算节点处理;计算节点在对子域进行处理时,仅考虑所有连通的孔隙单元,并对这些连通的孔隙单元进行重新编号,随后将它们分别存储在一维数组pi中,i为子域编号;每个连通的孔隙单元都与一个坐标相关联,并按照一维数组的顺序存储,得到存储有孔隙单元的序数和相应坐标的孔隙单元数组,各孔隙单元都通过它独有的序数和坐标进行追踪;When computing in parallel, assign N subdomains to N computing nodes one by one; when computing subdomains, computing nodes only consider all connected pore units, and renumber these connected pore units, and then They are respectively stored in the one-dimensional array p i , where i is the number of the subfield; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the pores that store the ordinal number of the pore unit and the corresponding coordinates are obtained An array of cells, each pore cell is tracked by its unique ordinal number and coordinates;
对于每个孔隙单元:将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组,得到一系列第一衍生数组;将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组,得到一系列第二衍生数组。For each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the momentum and local fluid density data of the pore unit Corresponding to the ordinal number of the pore unit, it is stored as a one-dimensional array to obtain a series of second derived arrays.
具体地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法,还可以具有以下特征:在步骤2中,最大单元数Mmax:Specifically, the LBM parallel optimization method based on connected pores to divide the calculation area provided by the present invention may also have the following characteristics: in
最小单元数Mmin:Minimum number of units M min :
式中,NX、NY、NZ分别为模拟区域x、y、z轴上包含孔隙与固体的总单元数。In the formula, NX, NY, and NZ are the total number of units containing pores and solids on the x, y, and z axes of the simulation area, respectively.
优选地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法,还可以具有以下特征:在步骤3中,每个孔隙单元的主要数据结构是其流体粒子分布函数和平衡流体粒子分布函数;将该孔隙单元计算前后的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组,得到四个第二衍生数组。Preferably, the LBM parallel optimization method based on connected pores to divide the calculation area provided by the present invention can also have the following characteristics: in
优选地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法,还可以包括:步骤4.设定子域间通信方式:在处理不同子域间的相邻或对角线孔隙单元时,将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元,使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。Preferably, the LBM parallel optimization method based on connected pores to divide the calculation area provided by the present invention may also include:
<装置><device>
进一步,本发明还提供了一种自动实现上述<方法>的装置,其特征在于,包括:Further, the present invention also provides a device for automatically implementing the above <method>, characterized in that it includes:
确定部,根据待模拟多孔介质试样的孔隙分布数据,确定试样的空隙单元信息和连通情况;根据系统中计算节点数量N确定应分解的子域总数N;The determining part determines the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determines the total number of sub-domains N to be decomposed according to the number N of computing nodes in the system;
计算域分解部,沿x轴将流域划分为nx个具有相同或相近单元数的区域,然后沿y轴将每个区域划分为ny个具有相同或相近单元数的子区域,最后将这nx×ny个子区域沿z轴划分nz次,得到nx×ny×nz=N个具有相同或相近单元数的子域;分解后的子域应满足条件:具有最大单元数Mmax的子域和具有最小单元数Mmin的子域间的单元数差异应不超过总单元数的千分之一;Calculate the domain decomposition part, divide the watershed into n x regions with the same or similar number of units along the x-axis, then divide each region into n y sub-regions with the same or similar number of units along the y-axis, and finally divide this n x ×n y sub-regions are divided n z times along the z-axis to obtain n x ×n y ×n z =N sub-domains with the same or similar number of units; the decomposed sub-domains should meet the condition: have the largest number of units The difference in the number of units between the subdomain of M max and the subdomain with the minimum number of units M min shall not exceed one thousandth of the total number of units;
计算任务分配部,将N个子域一一分配给N个计算节点处理,进行并行计算;计算节点在对子域进行处理时,仅考虑所有连通的孔隙单元,并对这些连通的孔隙单元进行重新编号,随后将它们分别存储在一维数组pi中,i为子域编号;每个连通的孔隙单元都与一个坐标相关联,并按照一维数组的顺序存储,得到存储有孔隙单元的序数和相应坐标的孔隙单元数组;对于每个孔隙单元:将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组,得到一系列第一衍生数组;将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组,得到一系列第二衍生数组;The calculation task allocation department assigns N sub-domains to N computing nodes one by one for parallel computing; when the computing nodes process sub-domains, they only consider all connected pore units, and recalculate these connected pore units. number, and then store them in the one-dimensional array p i respectively, where i is the subfield number; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the ordinal number of the stored pore unit is obtained and the pore unit array of the corresponding coordinates; for each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the pore unit The momentum and local fluid density data of the unit are stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of second derived arrays are obtained;
控制部,与确定部、计算域分解部和计算任务分配部均通信相连,控制它们的运行。The control part communicates with the determination part, the calculation domain decomposition part and the calculation task distribution part to control their operation.
优选地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置,还可以具有这样的特征:计算域分解部中,最大单元数Mmax:Preferably, the LBM parallel optimization device for dividing the calculation area based on connected pores provided by the present invention may also have such a feature: in the calculation domain decomposition part, the maximum number of units M max is:
最小单元数Mmin:Minimum number of units M min :
式中,NX、NY、NZ分别为模拟区域x、y、z轴上包含孔隙与固体的总单元数。In the formula, NX, NY, and NZ are the total number of units containing pores and solids on the x, y, and z axes of the simulation area, respectively.
优选地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置,还可以包括:通信方式设定部,与控制部通信相连,在处理不同子域间的相邻或对角线孔隙单元时,将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元,使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。Preferably, the LBM parallel optimization device based on connected pores to divide the calculation area provided by the present invention may also include: a communication mode setting part, connected to the control part in communication, and processing adjacent or diagonal pore units between different subdomains When , the interface layer on the interface of the current subdomain is extended with an additional layer of units to cover the adjacent and diagonal pore units, so that the communication mode of these expanded pore units is consistent with that of the interface units in the current subdomain.
优选地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置,还可以包括:输入显示部,与确定部、计算域分解部、计算任务分配部、通信方式设定部以及控制部均通信相连,用于让用户输入操作指令,并进行相应显示。Preferably, the LBM parallel optimization device for dividing the calculation area based on connected pores provided by the present invention may further include: an input display unit, together with a determination unit, a calculation domain decomposition unit, a calculation task allocation unit, a communication mode setting unit and a control unit It is connected by communication, and is used for allowing the user to input operation instructions and display them accordingly.
优选地,本发明提供的基于连通孔隙划分计算区域的LBM并行优化装置,还可以具有这样的特征:输入显示部能够根据操作指令对确定部确定的试样空隙单元信息和连通情况以及计算节点数量或子域总数N进行显示,对计算域分解部的分解情况进行显示,对计算任务分配部的分配情况进行相应显示。Preferably, the LBM parallel optimization device based on connected pores to divide the calculation area provided by the present invention may also have such a feature: the input display unit can determine the sample pore unit information and connectivity and the number of calculation nodes determined by the determination unit according to the operation instruction Or the total number N of sub-domains is displayed, the decomposition of the computing domain decomposition part is displayed, and the distribution of the computing task distribution part is displayed accordingly.
<存储介质><storage medium>
另外,本发明还提供了存储有用于实现上述<方法>程序的计算机可读存储介质。In addition, the present invention also provides a computer-readable storage medium storing a program for realizing the above <method>.
发明的作用与效果Function and Effect of Invention
本发明提供的基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质,能够根据计算节点数量随意均衡划分子域总数,子域总数不限于2R个,而且本发明能够有效降低因划分界面的曲折性而造成的通信复杂度,在流域分解过程中就平衡所有子域的工作负载,无需进行迭代等二次优化,减小了系统内存消耗,简化了计算任务分配过程,并且本发明使子域之间的通信是高效且易于实现的,减少了通讯时间;因此,本发明能够显著提高计算效率,特别适用于多孔材料孔隙尺度模拟计算量大、内存消耗大的处理,开拓了新的计算区域分解与处理思路,具有较大的推广价值。The LBM parallel optimization method, device and storage medium based on connected pores to divide the computing area provided by the present invention can randomly and evenly divide the total number of sub-domains according to the number of computing nodes. The total number of sub-domains is not limited to 2 R , and the present invention can effectively reduce The communication complexity caused by the tortuousness of the interface balances the workload of all sub-domains in the process of water domain decomposition, without the need for secondary optimization such as iteration, reduces system memory consumption, and simplifies the calculation task allocation process, and the present invention The communication between the sub-domains is efficient and easy to implement, and the communication time is reduced; therefore, the present invention can significantly improve the calculation efficiency, and is especially suitable for the processing of pore-scale simulation of porous materials with a large amount of calculation and large memory consumption, and opens up new possibilities. The calculation area decomposition and processing ideas of this paper have great promotion value.
附图说明Description of drawings
图1为本发明实施例中计算域分解过程示意图,其中(a)为沿x轴分解,(b)为沿y轴分解,(c)为沿z轴分解;Fig. 1 is a schematic diagram of the decomposition process of the computational domain in an embodiment of the present invention, wherein (a) is decomposed along the x-axis, (b) is decomposed along the y-axis, and (c) is decomposed along the z-axis;
图2为本发明实施例中分解得到子域后用一维数组标记多孔介质内的流体单元的过程示意图;Fig. 2 is a schematic diagram of the process of marking the fluid units in the porous medium with a one-dimensional array after decomposing and obtaining the sub-domains in the embodiment of the present invention;
图3为本发明实施例中0#子域与1#子域的界面示意图(白色为孔隙单元,灰色为固体单元);3 is a schematic diagram of the interface between the 0 # subdomain and the 1 # subdomain in the embodiment of the present invention (white is a pore unit, and gray is a solid unit);
图4为本发明实施例涉及的D3Q19离散速度模型图;Fig. 4 is a D3Q19 discrete speed model diagram related to an embodiment of the present invention;
图5为本发明实施例涉及的不同子域间对角线单元的信息交流过程示意图。FIG. 5 is a schematic diagram of an information exchange process of diagonal units between different sub-domains according to an embodiment of the present invention.
具体实施方式detailed description
以下结合附图对本发明涉及的基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质进行详细地说明。The LBM parallel optimization method, device and storage medium based on connected pore division calculation area involved in the present invention will be described in detail below with reference to the accompanying drawings.
<实施例><Example>
本实施例所提供的基于连通孔隙划分计算区域的LBM并行优化方法包括以下步骤:The LBM parallel optimization method based on connected pores to divide the calculation area provided in this embodiment includes the following steps:
步骤1.根据待模拟多孔介质试样的孔隙分布数据,确定试样的空隙单元信息和连通情况;根据系统中计算节点数量N确定应分解的子域总数N;对于均匀,结构分布简单的多孔材料,孔隙分布数据可直接由编程语言定义,对于非均匀、高度异质化的多孔结构可选用X-CT技术得到孔隙分布数据。
步骤2.分解计算域:
沿x轴将多孔介质试样流域划分为nx个具有相同或相近单元数的区域,然后沿y轴将每个区域划分为ny个具有相同或相近单元数的子区域,最后将这nx×ny个子区域沿z轴划分nz次,得到nx×ny×nz=N个具有相同或相近单元数的子域;并且,分解后具有最大单元数Mmax的子域和具有最小单元数Mmin的子域间的单元数差异应不超过总单元数的千分之一。Divide the watershed of the porous media sample along the x-axis into n x regions with the same or similar number of units, then divide each region into n y sub-regions with the same or similar number of units along the y-axis, and finally divide the n The x ×n y sub-regions are divided n z times along the z-axis to obtain n x ×n y ×n z =N sub-regions with the same or similar number of units; and, after decomposition, the sub-domain with the largest number of units M max and The difference in the number of units between subdomains with the minimum number of units M min should not exceed one thousandth of the total number of units.
最大单元数Mmax:Maximum number of units M max :
最小单元数Mmin:Minimum number of units M min :
式中,NX、NY、NZ分别为模拟区域x、y、z轴上包含孔隙与固体的总单元数。In the formula, NX, NY, and NZ are the total number of units containing pores and solids on the x, y, and z axes of the simulation area, respectively.
具体地,如图1所示,首先沿z,y和x方向连续扫描,遍历计算区域内的每个单元。应注意z→y→x的扫描顺序就是在xi截面上按先z后y的顺序遍历。图1(b)为在图1(a)的基础上,将每个子域沿y轴划分成了2个小的区域,具体为按x→z→y的顺序对子域的孔隙单元遍历编号并存入一维数组,然后将每个子域所关联的一维数组分为两部分,此时流域被分成了3×2×1个子区域。图1(c)在之前的划分基础上,用同样的方法按x→y→z的扫描顺序遍历每个子域的孔隙单元并重复划分过程,最终模拟区域被划分成3×2×2个子域。接着,如图2所示,筛选并将连通孔隙单元编号并存储在一个一维数组中,流域内共有N个孔隙单元。如图3所示,所有连通的孔隙单元被划分为多个大小相等的组,每个组与一个子域相关联,总组数等于分解后子域数。然后,将一维数组分成三小组,每小组包含N/3或N/3+1个孔隙单元,三个小组的最后一个单元编号分别为N0、N1和N。最后,恢复两个相邻子域间的接口,如图3和图1(a)所示,接口中包含每个分区的最后一个孔隙单元。Specifically, as shown in Fig. 1, firstly, scan continuously along the z, y and x directions, traversing each unit in the calculation area. It should be noted that the scanning order of z→y→x is to traverse in the order of z first and then y on the x i section. Figure 1(b) is based on Figure 1(a), each sub-domain is divided into two small areas along the y-axis, specifically, the number of pore units in the sub-domain is traversed in the order of x→z→y And store it into a one-dimensional array, and then divide the one-dimensional array associated with each sub-domain into two parts, and at this time the watershed is divided into 3×2×1 sub-regions. Figure 1(c) On the basis of the previous division, use the same method to traverse the pore units of each subdomain in the scanning order of x→y→z and repeat the division process. Finally, the simulation area is divided into 3×2×2 subdomains . Next, as shown in Figure 2, the connected pore units are screened and numbered and stored in a one-dimensional array, and there are N pore units in the watershed. As shown in Fig. 3, all connected pore units are divided into multiple groups of equal size, each group is associated with a subdomain, and the total number of groups is equal to the number of subdomains after decomposition. Then, divide the one-dimensional array into three groups, each group contains N/3 or N/3+1 pore units, and the last unit numbers of the three groups are N 0 , N 1 and N respectively. Finally, the interface between two adjacent subdomains is recovered, as shown in Fig. 3 and Fig. 1(a), the interface contains the last pore unit of each partition.
计算域分解方案将土样直接划分成了nx×ny×nz个子域,子域之间的工作负载之差就是每个子域间相邻界面上孔隙单元数的差异。后续模拟皆基于划分后的一维数组,模拟区域从1到nxi,nxi是子域在与处理器i相关联的x方向(流体流动方向)上的尺寸。The computational domain decomposition scheme directly divides the soil sample into n x × ny ×n z sub-domains, and the difference in workload between sub-domains is the difference in the number of pore units on the adjacent interface between each sub-domain. Subsequent simulations are all based on the divided one-dimensional array, and the simulation area is from 1 to nxi , where nxi is the size of the subdomain in the x direction (fluid flow direction) associated with processor i.
步骤3.分配计算任务:
并行计算时,将N个子域一一分配给N个计算节点处理;计算节点在对子域进行处理时,仅考虑所有连通的孔隙单元,并对这些连通的孔隙单元进行重新编号,随后将它们分别存储在一维数组pi中,i为子域编号;每个连通的孔隙单元都与一个坐标相关联,并按照一维数组的顺序存储,得到存储有孔隙单元的序数和相应坐标的孔隙单元数组,各孔隙单元都通过它独有的序数和坐标进行追踪。When computing in parallel, assign N subdomains to N computing nodes one by one; when computing subdomains, computing nodes only consider all connected pore units, and renumber these connected pore units, and then They are respectively stored in the one-dimensional array p i , where i is the number of the subfield; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the pores that store the ordinal number of the pore unit and the corresponding coordinates are obtained An array of cells, each pore cell is tracked by its unique ordinal number and coordinates.
对于每个孔隙单元:将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组,得到一系列第一衍生数组;将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组,得到一系列第二衍生数组。For each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the momentum and local fluid density data of the pore unit Corresponding to the ordinal number of the pore unit, it is stored as a one-dimensional array to obtain a series of second derived arrays.
每个孔隙单元的主要数据结构是其流体粒子分布函数和平衡流体粒子分布函数;将该孔隙单元计算前后的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组,得到四个第二衍生数组。The main data structure of each pore unit is its fluid particle distribution function and equilibrium fluid particle distribution function; the momentum and local fluid density data before and after the calculation of the pore unit are stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and four The second derived array.
如图4所示,根据LBM中的D3Q19模型,对每个连通孔隙单元x,都有至多18个相邻孔隙单元存储在18个一维数组中。每个数组与一个分量相对应。每个孔隙单元的主要数据结构是其19个流体粒子分布函数和18个平衡流体粒子分布函数,每个函数均由一个一维数组描述,另外的四个一维数组用于存储孔隙单元中的动量和局部流体密度。As shown in Fig. 4, according to the D3Q19 model in LBM, for each connected pore unit x, there are at most 18 adjacent pore units stored in 18 one-dimensional arrays. Each array corresponds to a component. The main data structure of each pore unit is its 19 fluid particle distribution functions and 18 equilibrium fluid particle distribution functions, each function is described by a one-dimensional array, and the other four one-dimensional arrays are used to store the Momentum and local fluid density.
步骤4.设定子域间通信(计算数据的提取与调用)方式:如图5所示,在处理不同子域间的相邻或对角线孔隙单元(图中的d和d′)时,将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元,使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。
在本实施的每个子域中,重叠区域存储在18个一维数组中,并没有进行多余或重复的计算。每个子域从1到nxi的流体单元的粒子分布函数和平衡态分布函数存储在两组不同的一维数组中(每组18个一维数组)。选用指针来定位接口的第一个孔隙单元,并且仅将接口层中的孔隙单元数据交换到相邻分区。由于仅存储了每个子域中包括接口层在内的所有孔隙单元,并且每个孔隙单元的相邻单元存储在18个一维数组中,因此每个孔隙单元的相邻单元在模拟中均能轻松的确定。每次迭代,在各个处理器上各自计算对应子域的流体粒子分布函数,然后再处理器之间交换子域接口界面的数据。在数据交换中,仅需要将接口层上流体单元18个分布函数分量中的5个传递给相邻处理器即可。例如,在x方向上,仅需要将界面nxi上的流体粒子分布函数f3[i]、f8[i]、f9[i]、f12[i]和f13[i]从右边的子域传递左边的子域,同样也仅需要将f1[i]、f7[i]、f10[i]、f11[i]和f14[i]从左边的子域传递给右边的子域。In each subfield of this implementation, the overlapping regions are stored in 18 one-dimensional arrays, and no redundant or repeated calculations are performed. The particle distribution function and equilibrium state distribution function of fluid cells from 1 to nxi in each subdomain are stored in two different sets of one-dimensional arrays (18 one-dimensional arrays in each set). The pointer is chosen to locate the first porosity unit of the interface, and only the porosity unit data in the interface layer is exchanged to the adjacent partition. Since only all pore units including the interface layer in each subdomain are stored, and the adjacent units of each pore unit are stored in 18 one-dimensional arrays, the adjacent units of each pore unit can be Easy OK. In each iteration, the fluid particle distribution function of the corresponding sub-domain is calculated on each processor, and then the data of the sub-domain interface is exchanged between the processors. In data exchange, only 5 of the 18 distribution function components of the fluid unit on the interface layer need to be transferred to the adjacent processors. For example, in the x direction, only the fluid particle distribution functions f 3 [i], f 8 [i], f 9 [i], f 12 [i] and f 13 [i] on the interface nx i need to be moved from the right The subfield of the left subfield is passed to the left subfield, and only f 1 [i], f 7 [i], f 10 [i], f 11 [i], and f 14 [i] need to be passed from the left subfield to right subdomain.
通过以上方法,本发明能够根据计算节点数量随意均衡划分子域总数,充分利用所有计算节点进行计算处理,而且能够有效降低因划分界面的曲折性而造成的通信复杂度,在流域分解过程中就平衡所有子域的工作负载,无需进行迭代等二次优化,减小了系统内存消耗,简化了计算任务分配过程,并且有效减小子域之间的通讯时间,从而本发明能够显著提高计算效率,数据量越大,用本发明方法的优势就越明显,因此,特别适用于多孔材料孔隙尺度模拟计算量大、内存消耗大的处理。Through the above method, the present invention can arbitrarily and evenly divide the total number of sub-domains according to the number of computing nodes, make full use of all computing nodes for computing and processing, and can effectively reduce the communication complexity caused by the tortuousness of the dividing interface. Balance the workload of all sub-domains without secondary optimization such as iteration, reduce system memory consumption, simplify the calculation task allocation process, and effectively reduce the communication time between sub-domains, so that the present invention can significantly improve computing efficiency , the larger the amount of data, the more obvious the advantages of the method of the present invention, therefore, it is especially suitable for the processing of pore-scale simulation of porous materials with a large amount of calculation and large memory consumption.
进一步,本实施例还提供能够自动实现上述方法的装置,该装置包括确定部、计算域分解部、计算任务分配部、通信方式设定部、输入显示部以及控制部。Furthermore, this embodiment also provides a device capable of automatically implementing the above method, and the device includes a determination unit, a calculation domain decomposition unit, a calculation task allocation unit, a communication mode setting unit, an input display unit, and a control unit.
确定部根据待模拟多孔介质试样的孔隙分布数据,确定试样的空隙单元信息和连通情况;根据系统中计算节点数量N确定应分解的子域总数N。The determining part determines the pore unit information and connectivity of the sample according to the pore distribution data of the porous medium sample to be simulated; determines the total number N of subdomains to be decomposed according to the number N of computing nodes in the system.
计算域分解部沿x轴将多孔介质试样流域划分为nx个具有相同或相近单元数的区域,然后沿y轴将每个区域划分为ny个具有相同或相近单元数的子区域,最后将这nx×ny个子区域沿z轴划分nz次,得到nx×ny×nz=N个具有相同或相近单元数的子域;分解后的子域应满足条件:具有最大单元数Mmax的子域和具有最小单元数Mmin的子域间的单元数差异应不超过总单元数的千分之一。The computational domain decomposition part divides the porous medium sample flow domain into n x regions with the same or similar number of units along the x-axis, and then divides each region into n y sub-regions with the same or similar number of units along the y-axis, Finally, divide the n x ×n y sub-regions along the z-axis n z times to obtain n x ×n y ×n z =N sub-regions with the same or similar number of units; the decomposed sub-regions should meet the conditions: have The difference in the number of units between the subdomain with the maximum number of units M max and the subdomain with the minimum number of units M min should not exceed one thousandth of the total number of units.
最大单元数Mmax:Maximum number of units M max :
最小单元数Mmin:Minimum number of units M min :
式中,NX、NY、NZ分别为模拟区域x、y、z轴上包含孔隙与固体的总单元数。In the formula, NX, NY, and NZ are the total number of units containing pores and solids on the x, y, and z axes of the simulation area, respectively.
计算任务分配部将N个子域一一分配给N个计算节点处理,进行并行计算;计算节点在对子域进行处理时,仅考虑所有连通的孔隙单元,并对这些连通的孔隙单元进行重新编号,随后将它们分别存储在一维数组pi中,i为子域编号;每个连通的孔隙单元都与一个坐标相关联,并按照一维数组的顺序存储,得到存储有孔隙单元的序数和相应坐标的孔隙单元数组;对于每个孔隙单元:将该孔隙单元的主要数据结构中各函数均与该孔隙单元的序数对应存储为一维数组,得到一系列第一衍生数组;将该孔隙单元的动量和局部流体密度数据与该孔隙单元的序数对应存储为一维数组,得到一系列第二衍生数组。The computing task allocation department assigns N sub-domains to N computing nodes one by one for parallel computing; when the computing nodes process the sub-domains, only all connected pore units are considered, and these connected pore units are renumbered , and then store them in the one-dimensional array p i respectively, i is the subfield number; each connected pore unit is associated with a coordinate, and stored in the order of the one-dimensional array, and the ordinal number and The pore unit array of the corresponding coordinates; for each pore unit: each function in the main data structure of the pore unit is stored as a one-dimensional array corresponding to the ordinal number of the pore unit, and a series of first derivative arrays are obtained; the pore unit The momentum and local fluid density data are stored as one-dimensional arrays corresponding to the ordinal numbers of the pore cells, resulting in a series of second derived arrays.
通信方式设定部与控制部通信相连,在处理不同子域间的相邻或对角线孔隙单元时,将当前子域界面上的接口层额外扩展一层单元以覆盖相邻与对角线孔隙单元,使扩展的这些孔隙单元的通信方式与当前子域内的接口单元一致。The communication mode setting part is connected with the control part by communication. When dealing with adjacent or diagonal pore units between different subdomains, the interface layer on the interface of the current subdomain is extended by an additional layer of units to cover adjacent and diagonal pore units. Aperture units, so that the communication mode of these expanded aperture units is consistent with that of the interface units in the current sub-domain.
输入显示部用于让用户输入操作指令,并进行相应显示。例如,输入显示部能够根据操作指令对确定部确定的试样空隙单元信息和连通情况以及计算节点数量或子域总数N进行显示,对计算域分解部的分解情况进行显示,对计算任务分配部的分配情况进行相应显示。The input display part is used for allowing the user to input operation instructions and display them accordingly. For example, the input display unit can display the sample void unit information and connectivity determined by the determination unit and the number of computing nodes or the total number of sub-domains N according to the operation instructions, display the decomposition of the calculation domain decomposition unit, and display the calculation task allocation unit The allocations are displayed accordingly.
控制部与确定部、计算域分解部和计算任务分配部、通信方式设定部均通信相连,控制它们的运行。The control unit communicates with the determination unit, the calculation domain decomposition unit, the calculation task distribution unit and the communication mode setting unit to control their operation.
以上实施例仅仅是对本发明技术方案所做的举例说明。本发明所涉及的基于连通孔隙划分计算区域的LBM并行优化方法、装置及存储介质并不仅仅限定于在以上实施例中所描述的内容,而是以权利要求所限定的范围为准。本发明所属领域技术人员在该实施例的基础上所做的任何修改或补充或等效替换,都在本发明的权利要求所要求保护的范围内。The above embodiments are merely illustrations for the technical solution of the present invention. The LBM parallel optimization method, device and storage medium based on connected pores to divide the calculation area involved in the present invention are not limited to the content described in the above embodiments, but are subject to the scope defined in the claims. Any modifications, supplements or equivalent replacements made by those skilled in the art of the present invention on the basis of the embodiments are within the protection scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210953478.0A CN115455794B (en) | 2022-08-10 | 2022-08-10 | LBM parallel optimization method, device and storage medium based on communication pore division calculation region |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210953478.0A CN115455794B (en) | 2022-08-10 | 2022-08-10 | LBM parallel optimization method, device and storage medium based on communication pore division calculation region |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115455794A true CN115455794A (en) | 2022-12-09 |
CN115455794B CN115455794B (en) | 2024-03-29 |
Family
ID=84297534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210953478.0A Active CN115455794B (en) | 2022-08-10 | 2022-08-10 | LBM parallel optimization method, device and storage medium based on communication pore division calculation region |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115455794B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120179436A1 (en) * | 2011-01-10 | 2012-07-12 | Saudi Arabian Oil Company | Scalable Simulation of Multiphase Flow in a Fractured Subterranean Reservoir as Multiple Interacting Continua |
CN109376481A (en) * | 2018-08-16 | 2019-02-22 | 清能艾科(深圳)能源技术有限公司 | Calculation method, device and computer equipment for digital core permeability curve based on multi-GPU |
CN112949112A (en) * | 2021-01-29 | 2021-06-11 | 中国石油大学(华东) | Rotor-sliding bearing system lubrication basin dynamic grid parallel computing method |
CN112992294A (en) * | 2021-04-19 | 2021-06-18 | 中国空气动力研究与发展中心计算空气动力研究所 | Porous medium LBM calculation grid generation method |
CN114565658A (en) * | 2022-01-14 | 2022-05-31 | 武汉理工大学 | Pore size calculation method and device based on CT technology |
-
2022
- 2022-08-10 CN CN202210953478.0A patent/CN115455794B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120179436A1 (en) * | 2011-01-10 | 2012-07-12 | Saudi Arabian Oil Company | Scalable Simulation of Multiphase Flow in a Fractured Subterranean Reservoir as Multiple Interacting Continua |
CN109376481A (en) * | 2018-08-16 | 2019-02-22 | 清能艾科(深圳)能源技术有限公司 | Calculation method, device and computer equipment for digital core permeability curve based on multi-GPU |
CN112949112A (en) * | 2021-01-29 | 2021-06-11 | 中国石油大学(华东) | Rotor-sliding bearing system lubrication basin dynamic grid parallel computing method |
CN112992294A (en) * | 2021-04-19 | 2021-06-18 | 中国空气动力研究与发展中心计算空气动力研究所 | Porous medium LBM calculation grid generation method |
CN114565658A (en) * | 2022-01-14 | 2022-05-31 | 武汉理工大学 | Pore size calculation method and device based on CT technology |
Non-Patent Citations (2)
Title |
---|
周鸿翔 等: "孔隙尺度多孔介质流体流动与溶质运移高性能模拟", 水科学进展, vol. 31, no. 3, pages 422 - 430 * |
张纲;王利民;葛蔚;: "格子Boltzmann方法多GPU并行性能的研究", 计算机与应用化学, no. 10, pages 4 - 13 * |
Also Published As
Publication number | Publication date |
---|---|
CN115455794B (en) | 2024-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sadayappan et al. | Nearest-neighbor mapping of finite element graphs onto processor meshes | |
Jones et al. | Computational results for parallel unstructured mesh computations | |
deCougny et al. | Load balancing for the parallel adaptive solution of partial differential equations | |
CN101727653B (en) | Graphics processing unit based discrete simulation computation method of multicomponent system | |
CN103761215B (en) | Matrix transpose optimization method based on graphic process unit | |
Tuncer et al. | Pacmap: Topology mapping of unstructured communication patterns onto non-contiguous allocations | |
US20230289398A1 (en) | Efficient Matrix Multiply and Add with a Group of Warps | |
EP2738675B1 (en) | System and method for efficient resource management of a signal flow programmed digital signal processor code | |
HSIEH et al. | Evaluation of automatic domain partitioning algorithms for parallel finite element analysis | |
Vaughan et al. | Enabling tractable exploration of the performance of adaptive mesh refinement | |
CN110222410B (en) | Electromagnetic environment simulation method based on Hadoop MapReduce | |
Feng et al. | Scalable 3D hybrid parallel delaunay image-to-mesh conversion algorithm for distributed shared memory architectures | |
CN115455794B (en) | LBM parallel optimization method, device and storage medium based on communication pore division calculation region | |
CN114119882A (en) | Efficient nested grid host cell search method in dynamic flow field analysis of aircraft | |
Ytterström | A tool for partitioning structured multiblock meshes for parallel computational mechanics | |
Liu et al. | Massively parallel CFD simulation software: CCFD development and optimization based on Sunway TaihuLight | |
CN116303219A (en) | Grid file acquisition method and device and electronic equipment | |
Minyard et al. | Octree partitioning of hybrid grids for parallel adaptive viscous flow simulations | |
Hirschmann et al. | Load balancing with p4est for short-range molecular dynamics with ESPResSo | |
Biswas et al. | Global load balancing with parallel mesh adaption on distributed-memory systems | |
CN117494509B (en) | Block structure self-adaptive grid refinement method based on multiple physical fields | |
Wu et al. | Agcm3d: A highly scalable finite-difference dynamical core of atmospheric general circulation model based on 3d decomposition | |
Rantakokko et al. | Parallel structured adaptive mesh refinement | |
Zagaris et al. | A framework for parallel unstructured grid generation for practical aerodynamic simulations | |
Chen et al. | Adaptable parallel strategy to extract polygons from massive classified images on multi‐core clusters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |