CN114998513A

CN114998513A - Earth simulation system grid remapping method with cycle boundary based on KD tree

Info

Publication number: CN114998513A
Application number: CN202210517802.4A
Authority: CN
Inventors: 曹宇; 陈研; 王辉赞; 张小将; 赵文静
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2022-09-02
Anticipated expiration: 2042-05-12
Also published as: CN114998513B

Abstract

The invention belongs to the technical field of geographic information data processing and reconstruction, and discloses a KD tree-based earth simulation system grid remapping method with a cycle boundary, which comprises the following steps: acquiring a grid of a global simulation system, determining source grid points, target grid points and a cycle boundary, and placing all the source grid points and the target grid points in a specified cycle section of each cycle dimension; if the target grid points are concentrated near the cycle boundary and the target grid points are far more than the source grid points, searching the source grid points corresponding to the target grid points based on a KD tree source point replication method; otherwise, searching a source grid point corresponding to the target grid point based on a KD tree target point replication method; and remapping the grid information according to the target grid point and the corresponding source grid point. The invention not only keeps the advantages of no requirement of KD tree on grid type and strong applicability, but also effectively solves the problem of cycle boundary in the grid remapping process with the earth simulation system, and has wide application prospect.

Description

Earth simulation system grid remapping method with cycle boundary based on KD tree

Technical Field

The invention belongs to the technical field of geographic information data processing and reconstruction, and particularly relates to a KD tree-based earth simulation system grid remapping method with a cycle boundary.

Background

Data searching in high dimensional space is one of the most difficult problems in many applications. As a classic data structure, the KD tree is widely used for data search in a high-dimensional space, especially nearest neighbor search and range search. Typical application scenarios include ray tracing, grid mapping, cluster analysis, and the like.

The KD-tree can be viewed as a special binary tree. The KD tree is different from a general binary tree in that the general binary tree always selects a fixed dimension for partitioning, and the KD tree can select any dimension for partitioning in each layer as needed. The dimension with the largest variance or the largest dispersion is usually chosen for partitioning in order to partition the search space as evenly as possible. Intermediate points are typically chosen as partitioning points to build a balanced KD-tree, thereby reducing tree height and shortening search time. After the KD tree construction is completed, the entire search space is actually organized in a binary tree according to the partitioning order. As shown in fig. 1, the KD-tree uses hyperplane partitioning of the space to obtain independent, unique, non-overlapping subspaces, provided that any dimension of the space is acyclic. However, many applications in reality require processing of loop boundary conditions. For example, in an earth modeling system, the longitude of the global grid is cyclic. Once a loop boundary exists in the search space, many pruning decisions in the original KD-tree based data search process will no longer hold.

Disclosure of Invention

A new data structure and algorithm are designed according to the characteristics of the cycle boundary, and the problem of application of the KD tree in a high-dimensional cycle space is solved. In view of the above, the invention provides a KD-tree-based earth simulation system grid remapping method with loop boundaries.

Specifically, the invention discloses an earth simulation system grid remapping method with a cycle boundary based on a KD tree, which comprises the following steps:

step 1, placing all source grid points and target grid points in a specified loop section of each loop dimension;

step 2, selecting different methods to search corresponding source grid points required by remapping the target grid points according to the distribution condition of the target grid points and the number of the target grid points and the source grid points;

step 201, if the target grid points are concentrated near the cycle boundary and the target grid points are far more than the source grid points, searching the source grid points corresponding to the target grid points based on a KD tree source point replication method;

step 202, otherwise, searching a source grid point corresponding to the target grid point based on a KD tree target point replication method;

step 3, remapping grid information according to the target grid point and the corresponding source grid point;

the source replication method described in step 201 includes the following steps: for each cycle boundary pair A and B, copying source grid points near the cycle boundary A to the outer side of the cycle boundary B, and copying source grid points near the cycle boundary B to the outer side of the cycle boundary A; constructing KD trees for original source grid points and copied source grid points based on geographic information data; searching corresponding source grid points for the target grid points according to a classical KD tree searching algorithm; performing result post-processing, namely mapping the searched source grid points back to corresponding original source grid points;

the target point replication method described in step 202 includes the following steps: constructing a KD tree based on geographic information data; acquiring a target grid point, searching a corresponding source grid point in the KD tree, and acquiring a current source grid point searching result; selecting an unprocessed circulation dimension, copying a target grid point to a position corresponding to the outer side of a circulation boundary at the side far away from the target grid point, searching a corresponding source grid point search result again based on the copied target grid point, and comparing and selecting a current optimal source grid point search result from the obtained source grid point search results; and repeating the operation for the remaining circulation dimensions to obtain the final optimal source grid point.

Further, the range of the replication in step 201 is determined by the distribution characteristics of the source grid point data, and includes:

if the maximum distance between any position in the search space and its nearest source grid point is L, only source grid points within L of the distance cycle boundary need be copied;

if the maximum distance between any position in the search space within the range of the cyclic boundary L and the nearest source grid point is L, only the source grid points within the range of the cyclic boundary L need to be copied;

if the distribution characteristics of the source grid point data are unknown or uncertain, the maximum replication area per loop boundary will be half of the search space.

Further, in the target point replication method in step 202, before a new search is started, for a selected cyclic dimension, if a distance from a target grid point to a nearest cyclic boundary of the target grid point is greater than a current search threshold, replication and search cancellation are performed on the target grid point of the selected cyclic dimension.

Further, in the target point replication method described in step 202, in a new search, the optimal distance between the current target grid point and the source grid point is used as the initial search distance.

Further, the searching process in step 202 adopts a nearest neighbor searching method, which specifically includes:

setting the initial closest distance between the initial target grid point and the source grid point to be T ₀ ；

After the first round of nearest neighbor searching, the shortest distance between the target grid point and the nearest source grid point is T;

processing each cycle dimension in turn; when processing an unprocessed loop dimension i, firstly comparing the distance S from a target grid point to a loop boundary which is closest to the target grid point in an ith dimension with the size of the current shortest distance T, and if S is not less than T, meaning that a point near another loop boundary in the ith dimension cannot be closer than the current point, directly starting to process the next loop dimension; otherwise, further searching is carried out in the current circulation dimension;

before starting new search, copying the target grid points to the same relative positions outside the corresponding cycle boundaries, and using the current shortest distance T as an initial shortest distance value for subsequent search to cut off more unnecessary branches in the new search;

assuming that the latest closest distance is T, comparing between T and T, and selecting the minimum result as the latest search result;

and after all the circulation dimensions are subjected to the operation, obtaining the final shortest distance and the corresponding source grid point as a final result.

Further, the searching process in step 202 adopts a K neighboring point searching method, which specifically includes:

setting an initial Kth close distance T of an initial target grid point and a source grid point ₀ ；

After the first round of nearest neighbor search, the final distance between the target grid point and the nearest Kth source grid point is T;

sequentially processing each circulation dimension; when an unprocessed circulation dimension i is processed, firstly comparing the distance S from a target grid point to a circulation boundary which is closest to the target grid point in the ith dimension with the distance T of the current nearest Kth source grid point, if S is not less than T, meaning that a relevant point cannot exist near another circulation boundary in the ith dimension, directly starting to process the next circulation dimension; otherwise, further searching is carried out in the current circulation dimension;

before starting new search, copying the target grid points to the same relative positions outside the corresponding cycle boundaries, and using the current Kth short distance T as an initial shortest distance value in subsequent search to cut off more unnecessary branches in the new search;

assuming that the new Kth near distance is T, comparing between T and T, and selecting the minimum result as the latest result;

after all the cycle dimensions are subjected to the above operation, the obtained Kth close distance and the corresponding source grid point are final results.

Further, the range searching method is adopted in the searching process in step 202, and specifically includes:

setting a search distance T;

sequentially processing each circulation dimension; when processing an unprocessed loop dimension i, firstly comparing the distance S from a target grid point to a loop boundary which is closest to the target grid point in an ith dimension with the size of a current search distance T, and if S is not less than T, meaning that a relevant point is unlikely to exist near another loop boundary in the ith dimension, directly starting to process the next loop dimension; otherwise, further searching is carried out in the current circulation dimension;

before starting new search, copying the target grid points to the same relative positions outside the corresponding cycle boundaries, and then carrying out new range search;

and after all the circulation dimensions are subjected to the operation, obtaining corresponding source grid points as final results.

The invention has the following beneficial effects:

the invention can effectively solve the problem of grid remapping of the earth simulation system with the cycle boundary, and can greatly reduce the search time by using a target point replication method.

Drawings

FIG. 1 is a schematic diagram of a KD tree;

FIG. 2 is a schematic diagram of the mirror method of the present invention;

FIG. 3 is an overall flow diagram of the present invention;

FIG. 4 illustrates a source point mirror copy process flow of the present invention;

FIG. 5 illustrates a target point mirror copy process of the present invention;

FIG. 6 shows the trend of search time of the source point copy mirroring method and the target point copy mirroring method of the present invention with the number of source points;

fig. 7 shows the trend of search time according to the target point number in the source point copy mirroring method and the target point copy mirroring method of the present invention.

Detailed Description

The present invention is further described with reference to the drawings, but the present invention is not limited thereto in any way, and any modifications or alterations based on the teaching of the present invention shall fall within the scope of the present invention.

In the actual processing of the loop space, a boundary is usually artificially divided, which is called a loop boundary. Thus, the search space is at the dimension of the loop boundary, and two virtual finite boundaries appear. The invention provides a method for realizing data search of a KD tree in a high-dimensional circular space by using a mirror image method, which comprises the following steps:

step 2, selecting different methods to search corresponding source grid points required by the remapping target grid points according to the distribution condition of the target grid points and the number of the target grid points and the source grid points;

the source replication method described in step 201 includes the following steps: for each cycle boundary pair A and B, copying source grid points near the cycle boundary A to the outer side of the cycle boundary B, and copying source grid points near the cycle boundary B to the outer side of the cycle boundary A; constructing KD trees for the original source grid points and the copied source grid points based on the geographic information data; searching corresponding source grid points for the target grid points according to a classical KD tree searching algorithm; performing result post-processing, namely mapping the searched source grid points back to corresponding original source grid points;

The range of replication described in step 201 is determined by the distribution characteristics of the source mesh point data, and includes:

In the target point replication method described in step 202, before a new search is started, for a selected cyclic dimension, if the distance from a target grid point to the nearest cyclic boundary is greater than the current search threshold, replication and search cancellation are performed on the target grid point of the selected cyclic dimension.

In the target point replication method described in step 202, the optimal distance between the current target grid point and the source grid point is used as the initial search distance in the new search.

In the searching process in step 202, a nearest neighbor searching method is adopted, which specifically includes:

After the first round of nearest neighbor search, the shortest distance between a target grid point and the nearest source grid point is T;

sequentially processing each cycle dimension; when processing an unprocessed loop dimension i, firstly comparing the distance S from a target grid point to a loop boundary which is closest to the target grid point in the ith dimension with the current shortest distance T, if S is not less than T, meaning that a point near another loop boundary in the ith dimension cannot be closer than the current point, directly starting to process the next loop dimension; otherwise, further searching is carried out in the current circulation dimension;

assuming that the latest nearest distance is T, comparing T and T, and selecting the minimum result as the latest search result;

In the searching process in step 202, a K neighboring point searching method is adopted, which specifically includes:

after all the circulation dimensions are processed by the operation, the obtained Kth close distance and the corresponding source grid point are final results.

In the searching process in step 202, a range searching method is adopted, which specifically includes:

setting a search distance T;

sequentially processing each circulation dimension; when an unprocessed loop dimension i is processed, firstly comparing the distance S from a target grid point to a loop boundary which is closest to the target grid point in the ith dimension with the current search distance T, and if S is not smaller than T, meaning that a relevant point cannot exist near another loop boundary in the ith dimension, directly starting to process the next loop dimension; otherwise, further searching is carried out in the current circulation dimension;

and after all the circulation dimensions are subjected to the operation, obtaining the corresponding source grid points as final results.

Examples

The mirror image method has two schemes from the selected object of the mirror image point, and the embodiment takes the 2-dimensional space nearest point search with cycle boundaries only on the left and right sides as an example: the first scheme is to select a source point for mirroring, that is, copy a point near the loop boundary a to the outside of the loop boundary B, copy a point near the loop boundary B to the outside of the loop boundary a, and the maximum copy area for each copy is half of the entire search space (as shown in fig. 2 (a)). The number of data points that need to be replicated may be further controlled if the distribution of the source data has certain further utilizable characteristics, such as the existence of a maximum distance between any location in the search space and its nearest source point. The second approach is to select the target point to mirror. After going through a search process, the target point may be copied to the outside of the loop boundary (assumed to be B boundary) farther from the target point, and a round of search may be performed again (as shown in fig. 2 (B)). The final result is selected from the two search results by comparison. If the search time is further reduced, new data structures and algorithms can be set by utilizing the characteristics of the KD tree and the distribution characteristics of the source data and the target data, and the pruning is further performed. For example, before the second search is started, if the distance from the target point to the B boundary is greater than the first search result, the second search may be cancelled.

The following sections describe the process of the invention in detail:

assume that the search space has K dimensions and C cycle boundaries. Without loss of generality, we set D [1] to D [ c ] as the loop dimension, where D [ i ] represents the ith dimension.

Fig. 4 shows the flow of the source point copy mirroring method. For each loop dimension, source points near one loop boundary are copied to the outside of another respective loop boundary. The extent of replication may be determined by the distribution characteristics of the source data. For example, if the maximum distance between any location in the search space and its nearest source point is L, then only source points within L of the distance of the loop boundary need be copied. In fact, the preconditions may be further attenuated. Finite replication methods are still effective as long as the source point density near the loop boundary reaches this condition. However, if the distribution characteristics of the source data are unknown or uncertain, the maximum repeat area per loop boundary will be half of the original search space. It should be noted that each copy is based on the original source data. After copying the relevant source points for each loop dimension, a KD-Tree can be constructed and the nearest points found according to classical Algorithms (for more details on these classical Algorithms see non-patent paper Cao y., Wang b., Zhao w. -j., et al, "Research on search Algorithms for actual formed mapping Based on KD Tree," 3rd International Conference on Computer and Communication Engineering. pp.29-33,2020). The last step is to map the searched source grid points back to the corresponding original source grid points.

Taking the nearest neighbor search as an example, the flow of the mirroring method based on the target point replication is shown in fig. 5. Previous work, including KD-tree construction and conventional nearest neighbor searching, is the same as ordinary KD-tree based nearest neighbor searching. Initial closest distance T ₀ Usually set to a negative number or a sufficiently large value. After the first round of data search, the final shortest distance between the target point and its nearest source point is T. Then, we should use a special methodEach loop dimension is processed. For simplicity, these loop dimensions may be processed sequentially. When processing the unprocessed loop dimension i, the distance S from the target point to the loop boundary closest to the target point in the ith dimension should initially be compared with the current shortest distance T. If S is not less than T, it means that a point near another loop boundary in the ith dimension is unlikely to be closer than the current point, and processing for the next loop dimension can begin directly. Otherwise, further searching in the current loop dimension is required. Before starting a new search, we need to copy the target point to the same relative position outside the corresponding loop boundary. Subsequent searches may use the current shortest distance T as the initial shortest distance value, which may help us to cut more unnecessary branches in the new search. Assuming that the new closest distance is T, we should make a comparison between T and T and select the smallest result as the latest result. After all the loop dimensions have been subjected to the above operations, the resulting shortest distance and corresponding source point will be the final result.

The method may become a range search for the circular space if the closest distance is exchanged for the specified distance. The method specifically comprises the following steps:

setting a search distance T;

The method can be changed into K adjacent point search for the circular space if the nearest distance is changed into the Kth small distance. The method for searching the K adjacent points comprises the following steps:

The conventional KD tree construction method comprises the following steps: and constructing the KD tree after preparing the two-dimensional point set to be divided. Firstly, judging whether a point set to be divided is an empty set, if so, finishing the construction process, and if not, formally starting the construction process. The first step is to select the partition dimension K, a common criterion being to prioritize the dimensions with high variance. Sequential partitioning strategies may be employed for simplicity. The second step is to select the segmentation point, and the common criterion is to select the point in the median in the current segmentation dimension, so that the numbers of elements on the left and right subtrees after segmentation are equivalent, the finally formed binary tree is a balanced binary tree, and the depth of each search is equivalent. The selection of the median usually requires the sorting of the dots, where a fast sorting method is selected. And thirdly, putting the rest points to be divided into the left sub-tree point set and the right sub-tree point set according to the selected dividing points. The specific rule is that points to be divided with the K-dimensional value smaller than or equal to the K-dimensional value of the division point are placed in the left subtree, and other points are placed in the right subtree. And (4) respectively recursively constructing the left and right subtrees by the left and right subtree point sets according to the method introduced earlier.

The conventional KD-tree search algorithm is as follows: when the current point T of the KD tree is empty, the search is ended. Otherwise, firstly, judging the dimension K divided by the T and calculating the distance between the T and the target point M. If T is closer to M, then the minimum distance D and the corresponding closest point need to be updated. After that, the subtree is entered to continue searching. If the value of the target point M in the partitioning dimension K is equal to or less than the value of the current point T in the dimension K, meaning that M is located in the left sub-tree space of T, the distance between the point in this space and M may be smaller than the distance between the point in the right sub-tree space and M, thus searching the left sub-tree of T recursively first. Otherwise, the right sub-tree of T is searched preferentially. When a leaf node is searched, the search process begins backtracking. And judging whether the distance between the target point and the current point in the K dimension is greater than the minimum distance D. If so, meaning that no more recent points are likely to exist within the other subtree space for the current point T, then the backtracking can continue. Otherwise, another subtree space of the current point T needs to be searched further. The KD tree-based range search process is similar to the nearest neighbor search process, and only the minimum distance D needs to be replaced with a fixed range value, and the operation of updating D and the nearest point needs to be replaced with recording the current point T. For more details, reference may be made to the non-patent paper Cao Y., Wang B., Zhao W. -J., et al, "Research on search Algorithms for Unstructured Grid mapping Based on KD Tree," 3rd International Conference on Computer and Communication Engineering technology, pp.29-33,2020.

Fig. 6 shows the time for searching data at different source point numbers for the two mirroring methods. The source point and the target point are 4-dimensional data, and the number (M) of the target points is 2 ²¹ The number of cycle boundaries is 4, and the number (N) of source points is from 2 ¹⁸ To 2 ²⁴ Changes, 2-fold increase each time. As can be seen from the figure, the target point replication method is superior to the source point replication method under the current configuration. The increase rate of the search time of the source point replication method is almost Mlog ₂ N is proportional, and the rate of increase of the search time for destination point replication decreases significantly as the number of sources increases, due to the rate at which the number of sources decreases the replication of destination points.

Fig. 7 shows the time for searching data under different target points by two mirroring methods. The source point and the target point are 4-dimensional data, and the number of the source points is 2 ²⁴ The number of cycle boundaries is 4, and the number of target points is from 2 ¹⁸ To 2 ²⁴ Changes, 2-fold increase each time. It can be seen from the figure that the search times of both the source point replication method and the target point replication method increase linearly with the increase of the target point. In addition, the search time of the destination point replication method is still less than the search time of the source point replication method.

The invention has the following beneficial effects:

the invention not only keeps the advantages of no requirement of KD tree on the type of the grid and strong applicability, but also effectively solves the problem of cycle boundary in the process of remapping the grid with the earth simulation system.

The above embodiment is an embodiment of the present invention, but the embodiment of the present invention is not limited by the above embodiment, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be regarded as equivalent replacements within the protection scope of the present invention.

Claims

1. The method for remapping the grid of the earth simulation system with the cycle boundary based on the KD tree is characterized by comprising the following steps of:

the source replication method described in step 201 includes the following steps: for each cycle boundary pair A and B, copying source grid points near the cycle boundary A to the outer side of the cycle boundary B, and copying source grid points near the cycle boundary B to the outer side of the cycle boundary A; constructing KD trees for the original source grid points and the copied source grid points based on the geographic information data; searching corresponding source grid points for the target grid points according to a classic KD tree search algorithm; performing result post-processing, namely mapping the searched source grid points back to corresponding original source grid points;

2. The method of claim 1, wherein the extent of the replication in step 201 is determined by the distribution characteristics of the source grid point data, and comprises:

3. The method of claim 1, wherein the target point replication method in step 202 copies and cancels the target grid point in the selected loop dimension if the distance between the target grid point and the nearest loop boundary is greater than the current search threshold for the selected loop dimension before the new search is started.

4. The method of claim 1, wherein the target point replication method in step 202 takes the optimal distance between the current target grid point and the source grid point as an initial search distance in a new search.

5. The method for KD-tree-based grid remapping of earth modeling systems with loop boundaries according to claim 1, wherein said searching in step 202 uses a nearest neighbor search method, specifically comprising:

sequentially processing each cycle dimension; when processing an unprocessed loop dimension i, firstly comparing the distance S from a target grid point to a loop boundary which is closest to the target grid point in an ith dimension with the size of the current shortest distance T, and if S is not less than T, meaning that a point near another loop boundary in the ith dimension cannot be closer than the current point, directly starting to process the next loop dimension; otherwise, further searching is carried out in the current circulation dimension;

before starting new search, copying target grid points to the same relative positions outside corresponding cycle boundaries, and using the current shortest distance T as an initial shortest distance value in subsequent search to cut off more unnecessary branches in the new search;

6. The method for reconstructing a grid of an earth modeling system with loop boundaries according to claim 1, wherein the searching process in step 202 uses a K neighboring point searching method, which specifically includes:

sequentially processing each circulation dimension; when processing an unprocessed loop dimension i, firstly comparing the distance S from a target grid point to a loop boundary which is closest to the target grid point in an ith dimension with the distance T of a current nearest Kth source grid point, if S is not less than T, meaning that a relevant point is unlikely to exist near another loop boundary in the ith dimension, directly starting to process the next loop dimension; otherwise, further searching is carried out in the current circulation dimension;

7. The method for KD-tree-based grid remapping of an earth modeling system with loop boundaries according to claim 1, wherein said searching in step 202 employs a range search method, which specifically comprises:

setting a search distance T;