CN109947563B - Parallel multilayer rapid multi-polar subtree structure composite storage method - Google Patents

Parallel multilayer rapid multi-polar subtree structure composite storage method Download PDF

Info

Publication number
CN109947563B
CN109947563B CN201910168049.0A CN201910168049A CN109947563B CN 109947563 B CN109947563 B CN 109947563B CN 201910168049 A CN201910168049 A CN 201910168049A CN 109947563 B CN109947563 B CN 109947563B
Authority
CN
China
Prior art keywords
layer
box
local
tree
boxes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910168049.0A
Other languages
Chinese (zh)
Other versions
CN109947563A (en
Inventor
杨明林
柳瑞青
郭琨毅
盛新庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201910168049.0A priority Critical patent/CN109947563B/en
Publication of CN109947563A publication Critical patent/CN109947563A/en
Application granted granted Critical
Publication of CN109947563B publication Critical patent/CN109947563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a parallel multilayer rapid multipolar subtree structure composite storage method, which divides the storage mode of a multilayer rapid multipolar subtree structure into a composite storage layer and other layers, and adopts a composite storage mode of a local storage tree and a non-local agent tree according to the composite storage layer from the next highest layer to the lowest layer of a box parallel layer: the multi-level subtree structure is discretely stored in each process in a distributed manner to form a local storage tree; the local storage box calculates the sequence compression storage of local layer non-local nodes required to be used in the remote interaction process to form a non-local proxy storage tree, when the local storage box is used, quick addressing is searched through two divisions, all layers of multi-polar subtree structures are completely stored on all processes on other layers, after the filling of the multi-polar adjacent action boxes for information is completed, the redundancy is removed through the high compression of the non-local proxy tree, and the multi-local proxy tree is only reserved at the bottom layer of the multi-layer quick multi-polar, so that the peak value memory is further reduced, and the node addressing efficiency is improved.

Description

Parallel multilayer rapid multi-polar subtree structure composite storage method
Technical Field
The invention belongs to the field of computational electromagnetism research, and particularly relates to a parallel multilayer rapid multi-polar sub-tree structure composite storage method.
Background
The multilayer fast multipole technology is an efficient fast algorithm in the field of computational electromagnetism, and can be used for accelerating the iterative solution of the final full-matrix equation of a moment method and can be accelerated by adopting the multilayer fast multipole technology. The multi-layer fast multipole technology divides a target into a plurality of layers of boxes, and realizes matrix-vector multiplication through aggregation, transfer and divergence in a grouping and layering mode. By means of efficient parallel computing technology, the computation scale of the moment method can be expanded to hundreds of billions of unknown quantities and tens of thousands of wavelengths.
In the multi-layer fast multipole technique, a minimal cubic box is first formed that surrounds the target, and then the cubic box is divided equally in three dimensions of space to form eight sub-boxes. This process is performed recursively until the lowest box size is less than a given value. This box partitioning process forms an octree structure with the original cubic box as the root node. In the important process of multilayer fast multipole, such as father box relation and son box relation during far-acting interpolation, searching of next adjacent boxes in the transfer process and the like, all need to be realized through traversal and searching of the octree nodes.
In the process of multi-layer fast multi-pole distributed parallel implementation, the complete multi-pole sub-tree is stored on each MPI process. For the surface area fraction equation, the number of boxes in each layer, except the initial higher few layers, is basically increased approximately by 4 times due to the discrete limitation of the unknowns on the target surface, i.e., the tree node tree. At the bottom Lmax layer, the number of boxes is about 10 times smaller than the unknowns. It is estimated that the finest layer box is about 10 billion at billions of unknowns. In parallel multi-layer fast multipole, for convenient calculation, a tree Node (Node) usually needs three variables to store a long shaping variable lKey of a Morton code of the box, the number of sub-boxes of the box is short, the shaping variable lNum is short, the Node stores a position offset amount long shaping variable lFrt in the layer, the structure alignment of a compiler is considered, the total number of 24 bytes is considered, and the memory required for completely storing a multipole sub-tree structure is more than 30G. When the size is not known by billions, thousands of MPI processes are generally needed to meet the memory requirement, and if the mode that a complete multi-level sub-tree structure is stored on all the processes is still adopted, the storage of a single tree structure reaches dozens of TB memories. However, if the tree structure is completely stored discretely, when multi-layer fast multipole operations such as interpolation, transfer, aggregation and the like are performed, the non-locally stored tree structure information needs to be obtained through communication, a large amount of extra communication is inevitably caused, and the efficiency is extremely low.
In conclusion, the storage of the multi-layer fast multipole subtree structure of the billion-scale electrical ultra-large target becomes a bottleneck of distributed parallel computing, and a new technology is urgently required to be developed to realize the efficient storage of the parallel multi-layer fast multipole subtree structure.
Disclosure of Invention
In view of this, the invention provides a parallel multilayer fast multipole subtree structure composite storage method, which can conveniently and efficiently realize the composite storage of a parallel multilayer fast multipole subtree structure.
The technical scheme for realizing the invention is as follows:
a parallel multi-layer rapid multi-pole subtree structure composite storage method comprises the following steps:
step one, reading the total number of edges of a target grid, and equally dividing the edges into all MPI processes according to the preset total number of the MPI processes;
step two, generating a hierarchical tree structure of a plurality of layers of rapid multipoles according to the divided edges of each MPI process;
thirdly, numbering each box in the hierarchical tree structure by adopting a Morton code, and sequentially storing the boxes in each layer according to the Morton code;
determining subordinate normalized division values of each edge in three spatial directions according to the point coordinates of the middle point of each edge, and determining the corresponding bottom box code of each edge;
step five, determining the initial layer number L according to the parallel layers of the box according to the size of each layer of box, the number of plane wave sampling points, the total process number and the parallel scheme to be adoptedbThen the initial layer L of the composite memory layerDNumber of layers of Lb+1;
Step six, starting from the box at the bottommost layer, all processes send morton codes of sub boxes of the same parent box of the first local group to the previous process, and according to the operation, the processes are carried out layer by layer from bottom to top until the highest layer of the composite storage layer; starting from the upper layer of the composite storage layer, performing communication among all processes, collecting the Morton codes of all the boxes of the layer, and performing layer by layer from bottom to top according to the operation until reaching the highest layer of the multilayer tree; the process determines the non-empty boxes of each layer of boxes, and then the octree structure of the multi-layer fast multipole non-empty boxes is obtained;
step seven, according toThe octree structure obtained in the step six is self-combined with the highest layer L of the storage layerDStarting to traverse downwards, and performing point-to-point communication between adjacent processes to obtain LDStoring each box and all the descendant boxes in a discrete layer to the same process;
eighthly, determining the discrete conditions of all boxes at the highest layer of the box parallel layer in each process according to the octree structure obtained in the step six, and storing each box at the highest layer of the box parallel layer and all corresponding descendant boxes in the same process;
step nine, according to the box discrete situation obtained in the step eight, edge information corresponding to the discrete box is sent between adjacent processes of the composite storage layer;
step ten, starting from the first layer of the multi-level subtree structure, establishing a near interaction queue of each box, and establishing a next adjacent queue of the next layer of boxes until the next layer is the upper layer of the highest layer in parallel according to the boxes through the near interaction queue; finding out all non-local box number sequences which are interacted with the local box closely according to the highest layer of the box parallel layer, defining according to a secondary adjacent action, and containing secondary adjacent and near interaction box information of all descendant nodes of the highest layer of the box parallel layer into a near interaction queue of the highest layer local box of the box parallel layer, and naming the queue as a redundant agent tree;
step eleven, traversing all tree nodes of each layer by layer from a first layer of a multipole tree structure to the highest layer in parallel according to boxes to obtain a near interaction box pair list; from a composite storage layer LDTo LmaxIf the required access node is a local tree, the required access node can be directly accessed, and if the required access node is not local, all nodes of the layer corresponding to the redundant agent tree need to be quickly searched and addressed in a binary search mode;
step twelve, deleting L in the redundant agent treemax-1 to LDNon-local node information of a layer, according to LmaxAnd extracting corresponding non-local node information from the redundant agent tree by using the near interaction list of the layer local box to generate a non-local agent tree.
Has the advantages that:
(1) the invention adopts a composite storage mode of the local storage tree and the non-local agent tree from the next higher layer to the bottom layer of the parallel layer of the box, and can obviously reduce the storage requirement of the multi-level sub-tree structure. According to the 4-time incremental characteristic of the multipole box, distributed storage of about 98% of nodes of the multipole subtree structure can be realized only by dispersing 3 layers, so that the program calculation process, particularly the peak memory during large-scale high-performance parallel operation, can be remarkably reduced.
(2) In the invention, on the composite storage layer, besides the local discrete storage tree, a non-local proxy tree of a box set adjacent to the local box is also constructed and stored. After the filling of the multi-layer fast multipole far interaction queue is completed, the redundancy of the non-local proxy tree is removed through compression, and except that the bottommost layer is continuously reserved for near interaction, the rest layers are released and deleted. The scale of the non-local agent tree is far smaller than that of a local storage tree, the multi-level sub-tree can be quickly traversed through binary search, and the traversing time of the multi-level sub-tree is hardly influenced in the calculation process. Except for a limited number of layers which are composite distributed storage according to box parallel layers, the multipole subtree structure is still completely stored on all processes at other layers (parallel layers according to plane wave parallel or hierarchical structure, depending on the parallel scheme used), so that the multipole subtree structure can be kept equivalent to the multipole subtree structure completely stored by all the original processes in terms of computation time.
(3) The filling of the local discrete storage tree of the composite storage layer and the non-local proxy tree can be efficiently carried out in parallel, only a small amount of local point-to-point communication is needed in the filling process, and the filling time is almost negligible compared with the whole program execution time. The middle array required in the filling process does not exceed the peak value memory of the whole parallel multilayer fast multipole technology calculation process, and is released immediately after filling is completed, so that the composite storage structure generation process is fast and efficient, and the memory requirement of the whole parallel program cannot be increased.
Drawings
FIG. 1 is a schematic diagram of a multi-pole layering process in a composite memory technology.
FIG. 2 is a schematic diagram of inter-process point-to-point communication.
FIG. 3 is a schematic diagram of box storage and layer numbering of each layer of a multipole.
FIG. 4 is a schematic diagram of the assisted tree fill level traversal-down carding process.
FIG. 5 is a schematic diagram of the sub-adjacent and near interaction of the multi-layer fast multipole technique.
FIG. 6 is a schematic representation of a proxy tree node statistics.
FIG. 7B-2 is a schematic representation of a model geometry of a bomber.
Fig. 8 schematic diagram of dual-station RCS for VV polarization of metal balls with radius 1200 wavelength and analysis and comparison.
FIG. 9B-2 aircraft model VV polarized two station RCS at 32GHz frequency.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention provides a convenient and efficient parallel multi-layer rapid multi-polar subtree structure composite storage method. The method divides the storage modes of the multi-layer rapid multi-pole subtree structure into a composite storage layer and other layers. The composite storage layer is generally a composite storage mode of a local storage tree and a non-local proxy tree from the next highest layer of the parallel layers of the boxes to the bottom layer of the multi-level subtree: the multi-level subtree structure is discretely stored in each process in a distributed manner to form a local storage tree; the local storage box far interaction box sequentially compresses and stores local layer non-local nodes required in the filling process to form a non-local proxy redundant storage tree, after the multipole secondary adjacent box pair is filled, the non-local proxy redundant storage tree is further compressed, only the near adjacent action part of the bottom layer box is reserved to form a final non-local proxy storage tree, and when the proxy tree nodes are searched, the proxy tree nodes are quickly addressed through binary search. The scale of the non-local agent tree is far smaller than that of a local storage tree, the non-local agent tree is stored in sequence, and the traversal time can be ignored by adopting a quick search method, so that the calculation efficiency is basically not influenced. And completely storing the multi-level subtree structures of all layers on other layers of non-composite storage. The composite storage technology is easy to realize, and can effectively reduce the memory required by the storage of the multi-level subtree structure until the memory is approximately independent of the process.
The basic idea of the invention is:
the storage mode of the multi-layer rapid multipolar subtree structure is divided into a composite storage layer and other layers, as shown in FIG. 1. And taking a composite storage layer from the next highest layer to the lowest layer of the parallel layer of the box, and adopting a composite storage mode of a local storage tree and a non-local agent tree: the multi-level subtree structure is discretely stored in each process in a distributed manner to form a local storage tree; the local storage box calculates the sequence compression storage of the non-local nodes of the layer required in the remote interaction process to form a non-local proxy storage tree, and when the local proxy storage tree is used, the fast addressing is realized through binary search. And at other layers, all processes completely save the multi-level subtree structures of all layers. After the multipole adjacent action box fills information, the redundancy is removed by high-degree compression of the non-local proxy tree, and the redundancy is reserved only at the bottommost layer of the multi-layer fast multipole, so that the peak value memory is further reduced, and the node addressing efficiency is improved.
Based on the above invention points, the specific implementation of the parallel multilayer rapid multi-level subtree structure composite storage method provided by the invention comprises the following steps:
the method comprises the following steps: and reading the maximum edge number of the target grid subdivision, and equally dividing the edges to all processes according to the preset total number of the processes used in the parallel process. Number of discrete edges per pass
Figure BDA0001986999520000061
Can be estimated as:
Figure BDA0001986999520000062
wherein the content of the first and second substances,
Figure BDA0001986999520000063
for this subdivision, total number of edges, NpFor this parallel calculation of the number of processes, i 1,2p
Step two: and generating a hierarchical tree structure of the multilayer fast multipole according to the edge discrete condition (each process respectively reads the midpoint coordinates of the discrete edge on the process). The division from top to bottom is used here, i.e. the smallest size box is constructed which surrounds the target, this being layer 0. And then, halving the box in three spatial dimensions layer by layer until the size of the box at the bottommost layer is less than 0.3 wavelength. The generated tree is an octree structure.
Step three: each box adopts a Morton code number which is interlaced in space, and the boxes of each layer are stored according to the Morton code sorting sequence. The number of the parent layer box corresponding to the layer of box can be obtained by the following calculation:
Mi-1=Mi>>3 (2)
wherein M isiIs the Morton code number of each box of the ith layer,>>indicating a shift operation.
Step four: and determining subordinate normalized division values of the edges in three spatial directions according to the midpoint coordinates of the edges, and determining the bottom box code corresponding to the edges. Because the discrete edges are randomly discrete according to the step one, a bucket sorting method is needed to be adopted, the maximum interval and the minimum interval of the Morton code corresponding to the box at the bottom layer are uniformly divided into processes, and the local discrete edges are sent to the corresponding processes according to the Morton code. After bucket sorting, the distribution condition of non-empty boxes in the bottommost box can be conveniently determined.
Step five: determining the initial layer number L according to the parallel layer of the box according to the box size of each layer, the number of plane wave sampling points and the total process number and according to the parallel scheme (such as a mixed parallel scheme, a grade grouping parallel scheme and a ternary parallel scheme) to be adopted by the programb. The plane wave sampling number of the ith layer box is determined according to the following formula:
Figure BDA0001986999520000071
Si=kd+αdln(kd+π) (4)
wherein alpha isdTaking a value according to the calculation precision requirement, generally taking 2.0, wherein d is the diagonal length of the box of the layer, and k represents the space wave number. The starting layer number L of the composite storage layerDIs Lb+1。
Step six: and traversing from the bottommost box upwards layer by layer to determine the number of non-empty boxes of each layer of box, and further constructing an octree structure of the multi-layer rapid multipole non-empty boxes. Because the edge dispersing process in the first step is divided into any equal parts, whether all the sub-boxes of each layer of boxes are completely positioned in the same process cannot be determined. Here, the integrity of the upper parent box is guaranteed locally to the process by each process sending a local first set of identical parent boxes to its previous process, as shown in fig. 2. Processing according to the mode, the process 0 does not need to send any information to other processes, the process with the largest process number does not need to receive information sent by other processes, and other processes all need to send the morton codes of the sub boxes of the first group of the same parent box to the previous process and receive the morton codes of the sub boxes sent by the next process. After the communication is finished, the morton code of the parent box corresponding to the local child box can be calculated according to the formula (2), and then the non-empty box information of the upper-layer box is determined and filled. This process is performed recursively until the composite storage layer is finished.
Setting the upper layer of the composite storage layer, namely the starting layer in parallel according to boxes, as a proxy tree filling layer LG. And all processes communicate with each other, and the Morton codes of all boxes on the layer are collected. At this time, all the box morton codes of the layer are completely stored on all the processes. And (3) calculating the Morton code of the slave parent box according to the formula (2) so as to fill the information of the non-empty boxes at the parent layer. This process is performed recursively until the highest level of the multipolar subtree. The boxes in each layer are stored sequentially, and the boxes are arranged sequentially according to the parent box numbering sequence, as shown in fig. 3. A tree Node (Node) typically requires three variables, a long shaping variable lnkey to store the morton code of the box, a short shaping variable lNum to number of sub-boxes of the box, a long shaping variable lFrt to store the amount of positional offset in the Node at this level,
step seven: and traversing from top to bottom to update the composite storage layer. The highest layer L of the process self-compound storage layerDStarting to traverse down, via point-to-point communication between adjacent processes, at LDIn a layer discrete box mode, LDEach box in a discrete layer and all its descendant boxes are completely saved on the same process so as to fill the proxy tree subsequently, as shown in fig. 4.
Step eight: and carrying out load distribution and determining the discrete condition of the boxes on each layer according to the parallel layers of the boxes. According to the boxThe row layer needs to maintain the integrity of the parent-child relationship between the two layers, so that the parallel layer L according to the box is determinedbAfter the box is discretized, all the descendant boxes are discretized to the same process.
Step nine: the box discrete mode after load distribution is generally different from the initial uniform discrete mode, so that the discrete mode of the multi-pole sub-tree structure of the composite storage layer needs to be adjusted. Because each box is discrete in sequence, communication can only occur between the process and two adjacent processes, and only a small part of node data needs to be sent to the corresponding process, so the execution time of the process is almost negligible.
Step ten: starting from the first level of the multi-level subtree structure, a near interaction queue of each box is established, and through the near interaction queue, a next adjacent queue of the next level of boxes is established. In the multilayer fast multipole technology, the near interaction refers to the next adjacent action of the adjacent box in the layer refers to the pair of boxes which are not adjacent in the layer but are adjacent to the parent box in the upper layer, as shown in fig. 5. This process is performed recursively until LGThe layer above the layer, i.e. the highest layer in parallel by the box. At the L thGAnd step eight, filling the near interaction box serial numbers of the local boxes according to the box discrete mode in the step eight, and compressing and removing the repeated serial numbers and the local box serial numbers until only non-local discontinuous serial number sequences are left. It will be apparent that the number of near interaction boxes per box is necessarily less than the total number of boxes that can be enclosed in space, i.e. 26. For the case of area integration, the actual value will be much less than this number, only one turn around the local box. The proxy tree node range in the two-dimensional case is shown in fig. 6. According to the definition of secondary adjacent effects, LGAll descendant node's next-neighbor, near-interaction box information will be contained in LGIn the near interaction queue of the layer local box. Here, this queue is named as a redundant agent tree. The storage structure of the redundant agent tree node comprises 4 variables, and besides the lKey, lFrt and lNum of the original tree node structure, a long shaping variable lOrigNum is additionally added and used for recording the original serial number value of the compressed node.
Step eleven: filling the multipole compartmentA list of near interaction box pairs. Starting from the first level of the multi-level subtree structure, a near interaction queue of each box is established, and through the near interaction queue, a next adjacent queue of the next level of boxes is established. This process recursively proceeds to the bottom most level of the multi-level subtree. In this process, for the first to Lth layersGAnd the multi-level sub-tree structure is completely stored on all processes, so that all the processes can directly traverse all the tree nodes of the corresponding layer. In the composite storage layer LDTo LmaxIf the node to be accessed is a local tree, directly accessing according to the node global number and the process storage offset; if the node to be accessed is not local, all nodes of the corresponding layer of the redundant agent tree are quickly searched and addressed in a binary search mode.
Step twelve: after the next adjacent box is filled to act on the pair list, all the parent-child box information between two layers is located locally without using a non-local proxy tree because the multi-layer fast multipole gathering (interlayer interpolation) and scattering (interlayer inverse interpolation) processes are in the composite storage layer in the subsequent iteration process, and therefore Lmax-1 to LDThe non-home agent tree of the tier may be deleted to further compress the memory. According to LmaxAnd extracting corresponding non-local node information from the redundant agent tree by using a near interaction list of the layer local box, compressing and storing to generate a final non-local agent tree so as to reduce the scale, reduce the memory and improve the efficiency of accessing nodes of the non-local agent tree in the subsequent iterative solution and near interaction matrix filling processes. And finally, storing the non-local agent tree according to the processes, and storing the agent tree nodes in each process according to the original node number sequence.
Through the processes, a complete final composite storage form of the multilayer rapid multi-level subtree is generated. 1 st to L thGAll the processes of each layer completely store the multi-level subtree structure of each layer; l isDLayer to LmaxThe multi-pole subtree nodes of the layer are discretely stored to the corresponding process according to the load distribution result; at LmaxAnd layer, compressing and storing non-local box information required by the near interaction of the local box in the non-local proxy tree according to the node number sequence. The tree information that may be used by far interactions is completely localAnd the method can be directly accessed through the global number in the box layer and the number offset value on each process without any communication. Non-local tree node information possibly used by near interaction is automatically compressed and stored in a non-local redundant agent tree, and a corresponding process is determined, so that dichotomy is quickly searched in a process-corresponding non-local redundant agent tree node set.
Example (b):
in this embodiment, 960 MPI processes are invoked for parallel computation by using the ternary parallel multi-level fast multipole routine of the multi-level fast multipole tree composite storage proposed in the present invention, targeting a metal sphere with a radius of 1200 wavelengths and a B-2 stealth bomber model with a geometric shape as shown in fig. 7. The computing platform is a 'meta' super computing platform of a Chinese academy of sciences network center, each node comprises 256G memory, 2 Intel-E5-2680V3 processors and 24CPU cores.
The statistics of the two different targets are carried out, the fast multi-pole sub-tree structure adopts the method that the whole tree structure is stored in all the original processes and the composite storage technology is adopted, and the statistics of the memory required by the storage of each layer of the multi-pole sub-tree are shown in the table 1 and the table 2. For a metal sphere target of 2400 wavelengths in diameter, the multipole subtree has 13 layers. And determining the composite storage layer to be a 10 th layer to a 13 th layer according to the process and the target electrical size, wherein the rest layers are complete storage layers, and the 9 th layer is a non-local proxy tree filling calculation layer. As can be seen from table 1, with the composite storage technique proposed in the present invention, the memory required for storing the multi-level subtree in the whole calculation process is reduced from the original 8.6TB to 45GB, which is reduced by nearly 200 times, while the total memory required for storing the non-local proxy storage tree is only less than 178 MB. For the B-2 stealth bomber model, the maximum electrical scale reaches 6000 wavelengths, and the generated multilayer rapid multipole tree structure reaches 15 layers. By adopting the composite storage technology provided by the invention, the memory required by storing the multipolar subtrees in the whole calculation process is reduced to 27GB from the original 12.7TB, the memory is reduced by nearly 500 times, and the final compressed non-local agent tree storage only needs 188 MB. It can be seen from these two comparison tables that the storage of the tree structure is significantly reduced by using the composite storage multi-level sub-tree structure proposed by the present invention.
TABLE 1 statistics of required memory for different storage types of metal ball targets
Figure BDA0001986999520000101
Figure BDA0001986999520000111
TABLE 2 aircraft model target different storage types each layer required memory statistics
Figure BDA0001986999520000112
The main process statistical time for the composite storage tree structure construction when performing simulation calculations on the metal ball and the B-2 aircraft model is shown in Table 3. From the table, it can be seen that the time required for generating the whole composite storage tree structure is only tens of seconds, and the total time is almost negligible compared with thousands of seconds in the whole calculation process, so that the generation of the composite storage multi-level sub-tree structure is very efficient.
TABLE 3 Main time statistics of Metal ball and airplane model target composite storage Structure Generation Process
The main structure of the composite storage tree Metal ball B-2 bomber model
Initial composite memory Tree Generation time(s) 14.34 23.85
Filling non-local redundant agent tree time(s) 1.32 1.87
Fill sub-adjacent boxes vs time(s) 5.76 6.07
Fill near interaction Box vs. time(s) 5.98 6.22
Compress generate Final agent Tree time(s) 0.78 1.02
The detailed computational resource statistics of the metal sphere target and the two targets of the B-2 model are shown in table 4. It can be seen from the above that, after the composite storage tree structure is adopted, the peak memory calculated by the whole program is only less than 20 TB. If the storage mode that the complete multi-level subtree structure is stored on all MPI processes is still adopted, when only 960 processes perform parallel computation, 8.6TB and 12.7TB are respectively required to be added to the memory. If the process is further doubled, the memory required for storing only the multi-level subtree structure will approach or even exceed this computation peak memory. This further demonstrates the necessity of employing complex storage techniques under the constraints of massive parallelism, electrically very large targets, billions of unknowns, etc.
TABLE 4 detailed calculation information statistics for metal ball and airplane model targets
Figure BDA0001986999520000121
Figure BDA0001986999520000131
Metal ball object program executionThe RCS result and the analytical solution of the calculation output are compared as shown in fig. 8, where the incident angle of the incident wave of the radar in fig. 8 is 0 °,
Figure BDA0001986999520000132
the observation plane is the xy plane, where θ ═ 0 ° is the backscattering. Statistically, the root mean square error (RMS) between the two is 0.7dB, which shows the correctness and effectiveness of the proposed multi-pole sub-tree structure composite storage technology. The calculation result of the large stealth bomber model target is shown in fig. 9, the incident wave angle of the radar is equal to 90 degrees,
Figure BDA0001986999520000133
the observation angle theta is equal to 90 degrees,
Figure BDA0001986999520000134
the multi-level subtree structure composite storage technology is shown to be effective to complex targets and has universality.
The invention provides a parallel multilayer rapid multi-polar subtree composite storage technology. From the application perspective, the large-scale stealth bomber models with the radius of 1200-wavelength metal balls and the radius of 6000-wavelength shown in the embodiment have the advantages that the calculation result is correct and effective, the storage requirement reduction effect on the tree structure is obvious, the generation and filling time can be almost ignored compared with the whole calculation process, the practicability and the high efficiency of the parallel method are proved, and the high-efficiency storage of a multi-layer rapid multipolar subtree structure of a billion unknown simulation target can be realized.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A parallel multi-layer rapid multi-pole sub-tree structure composite storage method is characterized by comprising the following steps:
step one, reading the total number of edges of a target grid, and equally dividing the edges into all MPI processes according to the preset total number of the MPI processes;
step two, generating a hierarchical tree structure of a plurality of layers of rapid multipoles according to the divided edges of each MPI process;
thirdly, numbering each box in the hierarchical tree structure by adopting a Morton code, and sequentially storing the boxes in each layer according to the Morton code;
determining subordinate normalized division values of each edge in three spatial directions according to the point coordinates of the middle point of each edge, and determining the corresponding bottom box code of each edge;
step five, determining the initial layer number L according to the parallel layers of the box according to the size of each layer of box, the number of plane wave sampling points, the total process number and the parallel scheme to be adoptedbThen the initial layer L of the composite memory layerDNumber of layers of Lb+1;
Step six, starting from the box at the bottommost layer, all processes send morton codes of sub boxes of the same parent box of the first local group to the previous process, and according to the operation, the processes are carried out layer by layer from bottom to top until the highest layer of the composite storage layer; starting from the upper layer of the composite storage layer, performing communication among all processes, collecting the Morton codes of all the boxes of the layer, and performing layer by layer from bottom to top according to the operation until reaching the highest layer of the multilayer tree; determining the non-empty boxes of each layer of boxes in the process of the sixth step, and further obtaining an octree structure of the multi-layer fast multipole non-empty boxes;
step seven, according to the octree structure obtained in the step six, the highest layer L of the storage layer is automatically synthesizedDStarting to traverse downwards, and performing point-to-point communication between adjacent processes to obtain LDStoring each box and all the descendant boxes in a discrete layer to the same process;
eighthly, determining the discrete conditions of all boxes at the highest layer of the box parallel layer in each process according to the octree structure obtained in the step six, and storing each box at the highest layer of the box parallel layer and all corresponding descendant boxes in the same process;
step nine, according to the box discrete situation obtained in the step eight, edge information corresponding to the discrete box is sent between adjacent processes of the composite storage layer;
step ten, starting from the first layer of the multi-level subtree structure, establishing a near interaction queue of each box, and establishing a next adjacent queue of the next layer of boxes until the next layer is the upper layer of the highest layer in parallel according to the boxes through the near interaction queue; finding out all non-local box number sequences which are interacted with the local box closely according to the highest layer of the box parallel layer, defining according to a secondary adjacent action, and containing secondary adjacent and near interaction box information of all descendant nodes of the highest layer of the box parallel layer into a near interaction queue of the highest layer local box of the box parallel layer, and naming the queue as a redundant agent tree;
step eleven, traversing all tree nodes of each layer by layer from a first layer of a multipole tree structure to the highest layer in parallel according to boxes to obtain a near interaction box pair list; from a composite storage layer LDTo LmaxIf the required access node is a local tree, the required access node can be directly accessed, and if the required access node is not local, all nodes of the layer corresponding to the redundant agent tree need to be quickly searched and addressed in a binary search mode;
step twelve, deleting L in the redundant agent treemax-1 to LDNon-local node information of a layer, according to LmaxAnd extracting corresponding non-local node information from the redundant agent tree by using the near interaction list of the layer local box to generate a non-local agent tree.
CN201910168049.0A 2019-03-06 2019-03-06 Parallel multilayer rapid multi-polar subtree structure composite storage method Active CN109947563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910168049.0A CN109947563B (en) 2019-03-06 2019-03-06 Parallel multilayer rapid multi-polar subtree structure composite storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910168049.0A CN109947563B (en) 2019-03-06 2019-03-06 Parallel multilayer rapid multi-polar subtree structure composite storage method

Publications (2)

Publication Number Publication Date
CN109947563A CN109947563A (en) 2019-06-28
CN109947563B true CN109947563B (en) 2020-10-27

Family

ID=67009142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910168049.0A Active CN109947563B (en) 2019-03-06 2019-03-06 Parallel multilayer rapid multi-polar subtree structure composite storage method

Country Status (1)

Country Link
CN (1) CN109947563B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303483B (en) * 2023-05-23 2023-07-21 北京适创科技有限公司 Compression method and device for structured grid, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7933944B2 (en) * 2006-07-14 2011-04-26 University Of Washington Combined fast multipole-QR compression technique for solving electrically small to large structures for broadband applications
CN102081690A (en) * 2010-12-30 2011-06-01 南京理工大学 MDA (Matrix Decomposition Algorithm)-combined novel SVD (Singular Value Decomposition) method for complex circuit
CN102708088A (en) * 2012-05-08 2012-10-03 北京理工大学 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation
CN109376485A (en) * 2018-12-03 2019-02-22 上海无线电设备研究所 High-speed simulation modeling method based on the ACA-MLFMA Region Decomposition non-conformal grid accelerated

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7933944B2 (en) * 2006-07-14 2011-04-26 University Of Washington Combined fast multipole-QR compression technique for solving electrically small to large structures for broadband applications
CN102081690A (en) * 2010-12-30 2011-06-01 南京理工大学 MDA (Matrix Decomposition Algorithm)-combined novel SVD (Singular Value Decomposition) method for complex circuit
CN102708088A (en) * 2012-05-08 2012-10-03 北京理工大学 CPU/GPU (Central Processing Unit/ Graphic Processing Unit) cooperative processing method oriented to mass data high-performance computation
CN109376485A (en) * 2018-12-03 2019-02-22 上海无线电设备研究所 High-speed simulation modeling method based on the ACA-MLFMA Region Decomposition non-conformal grid accelerated

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An Efficient Parallelization Approch of FEM-DDM for Large-Scale 3D scattering Problems;Ruiqing Liu,Minglin Yang;《2018 12th International Symposim on Antennas,Propagation and EM Theory(ISAPE)》;20181231;全文 *
基于共享内存的高效OpenMP并行多层快速多极子算法;潘小敏 等;《北京理工大学学报》;20120228;第32卷(第2期);全文 *
多极子与区域分解型高效电磁计算算法及其应用;杨明林;《中国博士学位论文全文数据库 工程科技二辑》;20150715;第2015年卷(第7期);全文 *

Also Published As

Publication number Publication date
CN109947563A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
Blelloch et al. Design and implementation of a practical parallel Delaunay algorithm
DE102013204062B4 (en) Full parallel-in-place construction of 3D acceleration structures in a graphics processing unit
Pınar et al. Fast optimal load balancing algorithms for 1D partitioning
Yang et al. A ternary parallelization approach of MLFMA for solving electromagnetic scattering problems with over 10 billion unknowns
CN109947563B (en) Parallel multilayer rapid multi-polar subtree structure composite storage method
Pan et al. Wide angular sweeping of dynamic electromagnetic responses from large targets by MPI parallel skeletonization
CN104573082B (en) Space small documents distributed data storage method and system based on access log information
Hariharan et al. A scalable parallel fast multipole method for analysis of scattering from perfect electrically conducting surfaces
Zayer et al. Sparse matrix assembly on the GPU through multiplication patterns
Acer et al. SPHYNX: Spectral Partitioning for HYbrid aNd aXelerator-enabled systems
Hariharan et al. Efficient parallel algorithms and software for compressed octrees with applications to hierarchical methods
CN108038313A (en) A kind of analysis method of the non-uniform electromagnetic characteristic of scattering of subdivision
Nath et al. Massively parallel algorithms for computing TIN DEMs and contour trees for large terrains
CN111767640A (en) Rapid simulation method for target near-field radar echo
Bernaschi et al. Multilevel parallelism for the exploration of large-scale graphs
Huang et al. A grid and density based fast spatial clustering algorithm
CN116244528A (en) Community detection evaluation method based on graph algebra
Lai et al. Accelerating geospatial applications on hybrid architectures
CN109918782B (en) Multilayer rapid multi-pole parallel grid fine-cutting method based on auxiliary tree
Capozzoli et al. The success of GPU computing in applied electromagnetics
Di Angelo et al. An efficient algorithm for the nearest neighbourhood search for point clouds
CN114239239A (en) Direct sparse solving method for rapid simulation of electromagnetic characteristics of bullet and eye meeting target
Camata et al. Parallel linear octree meshing with immersed surfaces
Du et al. DCCD: Distributed N-Body Rigid Continuous Collision Detection for Large-Scale Virtual Environments
CN106156479A (en) Quickly analyze the nested multilamellar complex point source method of metal target Electromagnetic Scattering Characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant