CN111274335A

CN111274335A - Rapid implementation method for spatial superposition analysis

Info

Publication number: CN111274335A
Application number: CN201910674053.4A
Authority: CN
Inventors: 赵艳伟; 袁晓光; 杨帆
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2020-06-12
Anticipated expiration: 2039-07-25
Also published as: CN111274335B

Abstract

The invention relates to a quick implementation method for spatial superposition analysis, and relates to the technical field of spatial superposition optimization. According to the parallel spatial superposition analysis method, the improved PLA (planar area) element structure is utilized, high-precision parallel spatial filtering based on a parallel grid index technology is respectively realized at an equipment end, and a candidate set is mapped onto a GPU (graphics processing unit) thread array in a fine-grained manner to realize parallel element refining processing, so that the many-core advantage of the GPU is fully exerted, and the analysis efficiency of spatial superposition is improved; the rapid implementation method for space superposition analysis provided by the invention can improve the performance of the algorithm by one order of magnitude compared with the CPU implementation on the surface-surface superposition analysis for processing intensive data, and can be effectively applied to scenes with higher real-time requirements; due to the advantages of the space superposition analysis-oriented quick implementation method in the space superposition analysis efficiency, a good exploration direction is provided for the performance improvement of the space analysis in the network map service mode.

Description

Rapid implementation method for spatial superposition analysis

Technical Field

The invention relates to the technical field of space superposition optimization, in particular to a quick implementation method for space superposition analysis.

Background

The spatial superposition operation is the most important and basic operation in spatial analysis, and the core idea of the superposition analysis is to perform set operation on different data layers under the same spatial reference system, and to achieve the purpose of extracting and analyzing implicit information among a plurality of spatial objects by generating a new data layer containing a plurality of spatial relationships and attribute relationships. The input of the overlay analysis at least comprises two data layers, which are called an overlay layer and an overlay layer, respectively. The data type of the superposed layer can be a point, a line, a surface and the like; the data type of the overlay layer is generally surface data. Thus, types of overlay analysis generally include: point-surface superposition, line-surface superposition and surface-surface superposition. The output of the overlay analysis will generate a new result layer, which contains the integrated geometric information and attribute information.

The coarse filtering step of the overlay analysis is similar to the spatial query, and can be directly eliminated or realized by adopting a method based on spatial index. The precise calculation step mainly comprises two processes of geometric intersection and attribute matching, wherein the purpose of geometric intersection is to find out the geometric intersection points of all the superposed layer elements and the superposed surface layer, and carry out closed-loop tracking according to the entrance and exit attributes of the intersection points to form a new space element. In order to make the new elements have practical significance, each element is usually assigned an ID as a unique identifier, and a one-to-one correspondence relationship with the new element is realized in a copy manner or a foreign key association manner in combination with the attribute information of the input layer.

With the development of networking and integrated combat modes, various data needing to be processed by the finger control system are more and more complex, the data volume is more and more huge, the data volume can reach more than TB level only through geographic information data, and when the conventional finger control system faces such intensive calculation tasks, the traditional processing mode cannot meet the real-time requirement of application on large-scale calculation. When complex processing tasks such as superposition analysis of massive battlefield space data are faced, the data processing period is required to be controlled within hundreds of milliseconds and even milliseconds generally, and the traditional space superposition method needs a large amount of geometric floating point calculation when the superposition operation is carried out on vector elements, and researches on the aspect show that in the precision matching process, the intersection calculation can occupy more than 80% of the whole calculation time, the efficiency of superposition analysis is seriously influenced by excessive precision intersection calculation, and the existing GIS space superposition function cannot meet the requirement of real-time control.

From the prospect of development of computer science at present, multi-computer cluster, distributed and parallel processing become the mainstream trend of dealing with large-scale complex computing modes, and in view of military application requirements and application characteristics of a command control system, the adoption of a parallel processing architecture becomes necessary. GPGPU (general Purpose graphics Processing Unit) has become an important acceleration means for high-performance computing and scientific computing application by virtue of large-scale parallel computing capability and easy programmability, and is widely applied to other general computing fields. GPU universal computing provides a solution for applications with complexity exceeding CPU solution power and computing requirements, while using less energy consumption and yet performing higher performance. Currently, major GPU hardware providers have issued corresponding software development tools SDK, enabling developers to use high-level programming languages for application development. Among them, CUDA released by NVIDIA corporation can be used as an extension of C/C + + language, and becomes one of the most popular parallel frameworks.

With the rise of GPGPU in the general computing field, the solution of geospatial information problem by using this new hardware parallel architecture has become a new research focus in the field of geographic information systems. The efficiency of the intersection calculation part is improved, and the motivation for carrying out parallel space superposition calculation by adopting the novel parallel architecture is motivated.

Disclosure of Invention

Technical problem to be solved

The technical problem to be solved by the invention is as follows: how to design a quick implementation method facing to space superposition analysis to solve the problem of low superposition efficiency caused by a large amount of intersection calculation in the accurate matching process of the traditional superposition analysis method.

(II) technical scheme

In order to solve the technical problem, the invention provides a rapid implementation method for spatial superposition analysis, which comprises the following steps:

step 100, carrying out parallel spatial filtering on the elements of the superposed layers by using a new index structure suitable for a GPU;

step 200, performing parallel accurate intersection operation on the superposed layer and the superposed layer according to the filter result and the element;

and 300, tracking the result according to the space connection type Intersect.

(III) advantageous effects

According to the parallel spatial superposition analysis method, the improved PLA (planar area) element structure is utilized, high-precision parallel spatial filtering based on a parallel grid index technology is respectively realized at an equipment end, and a candidate set is mapped onto a GPU (graphics processing unit) thread array in a fine-grained manner to realize parallel element refining processing, so that the many-core advantage of the GPU is fully exerted, and the analysis efficiency of spatial superposition is improved; the rapid implementation method for space superposition analysis provided by the invention can improve the performance of the algorithm by one order of magnitude compared with the CPU implementation on the surface-surface superposition analysis for processing intensive data, and can be effectively applied to scenes with higher real-time requirements; due to the advantages of the space superposition analysis-oriented quick implementation method in the space superposition analysis efficiency, a good exploration direction is provided for the performance improvement of the space analysis in the network map service mode.

Drawings

FIG. 1 shows a PLA structure, where (a) is a vector surface element and (b) is a PLA structure corresponding to the vector surface element;

FIG. 2 is a candidate pair parallel generation process;

FIG. 3 is a diagram of parallel deal task allocation;

FIG. 4 is a case where the line segments are collinear;

FIG. 5 is a cross product intersection condition;

FIG. 6 is an attribute information structure, where (a) is an overlaid layer and (b) is an overlaid layer;

FIG. 7 is a diagram illustrating the judgment of an entering/exiting point, wherein (a) is an entering point and (b) is an exiting point;

FIG. 8 is a cross point parallel write;

FIG. 9 is an inclusivity test;

FIG. 10 is a parallel sort process, in which (a) is a sort by ring id, (b) is a sort by line id, and (c) is a sort by distance;

FIG. 11 is a result loop tracking process;

FIG. 12 is a comparison of efficiency with the GEOS method;

FIG. 13 is a comparison of efficiency with the GPC method;

FIG. 14 is an efficiency acceleration ratio;

FIG. 15 compares the efficiency with the GPU method.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

The invention provides a method for carrying out parallel spatial superposition intersection operation by utilizing a GPGPU (general purpose graphics processing unit) on the basis of parallel spatial index by taking a superposition intersection operation relation as an example and processing a more complex surface element layer as an object aiming at the operation bottleneck of spatial superposition, and tries to explore a new means for accelerating analysis and processing of geospatial information by utilizing a 'many-core' mode of a graphics processor by knowing a parallel mechanism in superposition analysis.

The invention relates to a quick implementation method facing to space superposition analysis, which is characterized in that a surface element data structure is improved to be adapted to a storage mode of a GPU video memory in a linear mode and support quick positioning of elements, then a grid space indexing technology suitable for a GPU parallel mode is utilized at an equipment end to realize high-precision parallel space filtering, in a refining step, a fine-grained parallel accurate element intersection operation utilizing a GPU thread array is designed in a candidate set by mainly utilizing a cross product idea of computational geometry, an intersection point sequence is reorganized by utilizing a parallel merging and sorting method, and finally a superposition result is tracked by utilizing an intersection point attribute to generate a result ring. The method filters the non-relevant space elements to the maximum extent, simplifies the calculation steps, is more suitable for processing the superposition analysis of the multi-elements on the multi-elements, has stronger expandability and can greatly improve the calculation efficiency, and provides an innovative solution for the space superposition.

The key technical problems to be solved by the invention comprise: 1. designing a data structure of the surface layer element; 2. the efficiency of index transmission between the CPU and the GPU; 3. and the utilization rate and the parallel efficiency of threads are improved by utilizing a GPU multithreading merging access mechanism.

The implementation of the fast implementation method for spatial superposition analysis according to the present invention is described in detail below with reference to the above-mentioned objects, and the method includes the following steps:

and 300, tracking the result according to the space connection type Intersect.

The step 100 includes the following steps:

(101) data structure of design surface layer element

In the CUDA structure, since there is no pointer operation and no dynamic application of memory space, generally, the linear array is the most effective data structure used on the GPU device side. For the surface-layer elements, because the surface elements may have holes, in the conventional GIS system and SDB, it is necessary to store coordinate points of each element, organize the coordinate points into a ring structure, and further organize a plurality of rings into a certain geospatial element. In order to keep these properties in the GPU as well, a pla (poly Linear array) structure is designed based on the coordinate queue, wherein three auxiliary queues are included for recording other information of the surface layer. Respectively as follows: fa (feature array): the element auxiliary queue is responsible for storing indexes of the corresponding rings; ra (ring array): a ring auxiliary queue in charge of storing indexes of corresponding coordinates; ca (coding array): and the specific coordinate point queue is responsible for storing X and Y floating point coordinates. If it is assumed that the spatial layer omega contains n spatial elements, and the index number of the element is represented by lambda

Denoting the index number of the loop and the index number of the coordinates by κ, the PLA structure can be formally defined as follows:

the spatial layer Ω is represented as a set of all FA queues, which for one element λ contains a ring composed of

Denotes, that is, FA [ lambda ]]The position of the last ring contained in the current element in the RA queue is stored, and the position of the starting ring contained in the element lambda in RA is FA [ lambda-1 ]]+1, thus obtaining that the number of rings contained in this element is FA [ lambda ]]-FA[λ-1]For a ring, the same principle applies

The coordinates of which contained in the queue CA are composed of

It is shown that,

the position of the last coordinate of the current ring in the CA queue is stored, ring

Including the position of the start coordinate in the CA as

The ring comprises the number of coordinates of

According to the above definitions and pushesThen, a mapping relationship between the element queue FA and the coordinate queue CA is obtained, that is, the coordinate set included in the element λ is:

and can quickly determine the number of coordinates included in the element as RA [ FA [ lambda ]]]-RA[FA[λ-1]]；

After the coordinates of each element are read according to the element sequence in the original data and are continuously stored in CA, the storage design is performed according to the definitions of two auxiliary linear queues of RA and FA, if a coordinate sequence of Feature1 is obtained, the index of the ring corresponding to the element in RA is firstly found according to the index of the element and is { FA [0] +1, …, FA [1] } {1,2,3}, then in the RA queue, the coordinate id range of ring 1 is { RA [0] +1, …, RA [1] } {10,11, …,27}, the coordinate id range of ring 2 is { RA [1] +1, …, RA [2] } {28,29, …,31 ] } { RA [2] } { CA [3] + 31, 38, 11, …,27}, the coordinate id range of ring 2 is { RA [1] +1, …, RA [28, 29, …,31 ] } { CA [32, 38 } and { CA [32] + 11, 11 } coordinates of { CA [ 14, 11 } in the CA [ 14 } in the coordinate range of { CA [ 7} a { CA [ 7} in the RA [ 7} set of { CA [ 7} and { CA [ 7} in a { CA [ 7} coordinate id { CA [ 7] }, { 11 ] }.

The benefits of using this structure are: the specific coordinate of a specific element or a specific ring can be found by scanning the 3 linear arrays, and simultaneously, the auxiliary information such as the number of the elements, the number of the rings of each element, the number of the coordinates of each ring and the like of the current data set can be clearly known.

In addition, considering that the GPU has a certain limitation on the video memory, the video memory space is often not as large as the CPU memory space, so when the data amount is very large, because the number of coordinate points stored by the GPU is huge, the whole data set may not be mapped into the GPU, and batch processing is required, and at this time, if such an auxiliary structure is not available, the specific coordinate position of the element to be processed cannot be determined quickly, which affects the efficiency.

Certainly, the cost of improving the element positioning speed is to open up two array spaces FA and RA, but experimental analysis shows that the number of polygons contained in a data set with 3748046 points is 2453, the number of rings contained in the data set is 2519, and compared with massive element space coordinates, the space occupied by the auxiliary structures FA and RA is very small, so that the two structures can be completely loaded into a video memory at one time. Therefore, when the actual layer set to be superimposed is processed, the two situations can be considered, if the data is small, all the FA, RA and CA are loaded into the video memory for processing, and if the video memory space cannot be put into all the data sets at one time, the auxiliary structure can be transmitted into the GPU, and then batch processing is carried out.

(102) Building grid index for superimposed layers

Step 1021, inputting vector element information, and reading point coordinates of the component vector elements and corresponding vector element index information; coordinate conversion is carried out through a coordinate conversion channel, and the vector element coordinates are accurate to sub-pixel accuracy;

step 1022, performing grid rendering on the vector element according to the vector element information

The element rendering process is divided into three steps, wherein in the first step, outline information of the vector elements is extracted; secondly, scanning the outline and calculating a filling grid area by using a grid controller; step three, performing cross-section or cross-unit filling on the filling area in a horizontal scanning mode, performing an index analysis process, acquiring index information through input vector elements, mapping the index information into corresponding grid units through an index conversion engine, and finally generating a grid index through rendering and caching with a vector element rasterization result;

(103) generating candidate pairs using parallel trellis indices

For the spatial superposition operation of two surface layer data, an Intersect intersection operation is taken as a research focus. When processing two massive surface data, firstly, spatial filtering is required, and in this part, a parallel grid index mechanism is adopted to complete rapid non-intersecting element filtering operation, and the specific key steps are shown in fig. 3.

For the superimposed layer a, the pre-generated parallel raster index is stored in the device video memory of the GPU, and for the superimposed layer B, outsourcing by using the respective surface elements of the superimposed layer is selected to perform parallel spatial filtering. And ensuring that each thread block processes the filtering task of one surface element on the basis of the distribution strategy of the thread blocks, wherein each thread in the thread blocks is responsible for each grid unit in the grid index covered in the surface element outsourcing range.

For the current facet element outsourcing W1, the filtering result is not processed only on the element index set in A of the capping. For the Intersect operation here, it is precisely known which specific element in the overlay layer B intersects with which element in the overlay layer a, so as to prepare for further processing for extracting the elements. Therefore, the filtering result is reduced in a key-value form, wherein the element id of the B layer is used as a key, and the element id of the A layer is used as a value. Due to the characteristics of parallel grid indexes, redundant candidate pairs can exist in the filtering result, the method adopted at the moment is to utilize the start _ by _ key method of the thread to carry out parallel sequencing on the result pairs, and then further utilize the compact method of the thread to filter repeated candidate pairs in parallel. After the processing of the steps, the generation of the input set to be subjected to the intersection calculation is finished.

Step 200, comprising the following steps:

(201) parallel intersection task partitioning

Before parallel intersection, the allocation of thread blocks needs to be determined. For the specification of the number of thread blocks, in order to ensure that each thread block is responsible for the precise intersection of one candidate pair, the number of thread blocks is determined according to the number of candidate pairs, and if a common candidate set of S pairs is provided, the thread blocks can be represented as Blocknum (S,1) if the matching is performed in a two-dimensional manner.

In order to maximize the parallelism, secondly, it is ensured that each thread block has enough threads, in this method, because the number of registers used by the kernel parallel intersection function is 22, if on a GPU device with a computing power of 1.0, we can only select to use 256 threads per thread block, because the number of registers available per thread block on the 1.0 device is 8192, 256 < 8192 ÷ 22 < 512. For the device GTX260 with computing power 1.3, each thread block contains 16384 registers, 16384 ÷ 22 > 512, so that it can be allocated in a two-dimensional manner using 512 as the maximum number of threads, assuming that the thread block is organized in Threadnum (32, 16).

For this allocation, we can calculate the Occupancy of the kernel, for the device GTX260 with computing power 1.3, the maximum number of active threads per stream multiprocessor is 1024, the maximum number of active warp is 32, since the maximum number of registers per block is 16384, and each thread uses 22 registers, the number of active warp in each stream multiprocessor is 16, and the Occupancy is 50%.

After the thread is distributed, when kernel parallel intersection is executed, the current thread block firstly finds a candidate pair to be processed according to a built-in variable, and then finds a corresponding specific element coordinate by utilizing a pre-generated PLA structure of each layer stored in a video memory according to a superposed element id and a superposed element id related to the candidate pair. And finally, organizing the coordinates into a two-dimensional line segment matrix according to the rule that the superposed elements are used as row vectors and the superposed elements are used as column vectors. The threads in the thread block will determine the line index of the processed superimposed element, i.e. the row number in the line matrix, from its built-in variable threadedx. Likewise, the segment index of the superimposed element, i.e., the column number in the segment matrix, is determined from the threadidx.y. When the built-in variable range is exceeded, a loop processing mode is adopted to ensure the maximum utilization rate of the thread. The specific flow of task allocation is shown in fig. 3.

(202) Parallel intersection calculation at equipment end

After determining which two line segment calculation targets each thread should process, fine-grained parallel intersection algorithm may be used to perform precise floating point operation in line segment units. For vector space data, the most accurate method is to adopt the idea of calculating geometry when performing intersection operation, but this method often causes the use frequency to be lower due to large calculation amount. However, in the method, because a many-core mechanism of the GPU device side is utilized, and the thread has natural advantages when floating point operation is carried out, the intersection operation can be carried out by utilizing the idea of calculation geometry, so that the intersection result is accurate, and the execution efficiency is high.

According to this idea, in the kernel function, the following problems need to be considered and solved. (1) If the two line segments processed by the current thread are collinear, how to count the intersection in this case. As shown in FIG. 4, assume that the current thread is responsible for computing the line segment a1 of the A-layer element A1 and the line segment B1 of the B-layer element B1, which have co-linear portions. We have found through analysis that within a ring of face elements, two adjacent segments must have a common end point, i.e., for segment a1, there must be segment a2 and it shares the same end point p, p on a1 and a2, and then p must also be on b 1. Therefore, we can disregard the collinear case because there must be a line segment a2 and b1 intersecting p. That is, when two line segments are intersected in parallel, there is either no intersection or at most one intersection.

In the aspect of specific intersection calculation, intersection judgment is performed according to the circumscribed rectangles of the line segments through a rejection experiment, and then intersection and non-intersection processing are performed respectively. If the two line segments are intersected, judging the direction according to a cross product method, wherein the specific method comprises the following steps: let the line segment processed by the current thread be the line segment (p) of a certain element of the A layer₁,p₂) And a line segment (p) of an element of the B layer₃,p₄) If want to judge (p)₁,p₂) And (p)₃,p₄) Intersect, then (p) is guaranteed₁,p₂) And (p)₃,p₄) Cross each other, i.e. for vectors

In other words, the p1 point and the p2 point are distributed on both sides thereof, and the vectors are simultaneously paired

For example, the p3 dots and the p4 dots are distributed on both sides. As shown on the left of fig. 5, by calculating the vector

And

if the cross product is greater than 0, it indicates that

In that

If the cross product is less than 0, this indicates that

In that

Counter-clockwise. To ensure the p1 point and p2 point distribution

On both sides, it is guaranteed

And

relative to

In the opposite direction, i.e.

Similarly, for FIG. 5, right, it is guaranteed

And after judging that the two line segments are intersected by the cross product, further solving the intersection point in parallel according to the idea of calculating geometry.

In storing the calculation results, i need to store the specific spatial coordinates of the intersection pointsThey also need to keep some attribute information for each intersection in order to prepare for tracking the overlay results. Suppose a line segment (p) of a line segment A layer element₁,p₂) Line segment (p) of element of layer B₃,p₄) Intersect at p₀Then except that p is saved₀In addition, the information we need to keep includes: the (p) is₁,p₂) The ring id where the ring is located; (p)₁,p₂) Id (may be p)₁Id of (d); to the starting point p₁The distance of (d); (p)₃,p₄) The ring id where the ring is located; (p)₃,p₄) Id (may be p)₃Id of (d); to the starting point p₃The distance of (c). As shown in fig. 6, __ align __ (16) is adopted to realize the merged access to the video memory.

The attribute information of the out-point and in-point can be recorded in the segment components of the info a and info b, i.e. if the current intersection point is an in-point, then the info b. The in-out attribute of the intersection point can be judged according to the direction and cross product of the polygon. Knowing that the polygon outer ring direction is clockwise, then for line segment (p)₁,p₂) And (p)₃,p₄) If vector of

In that

Clockwise of (a), then (p) can be judged₃,p₄) Line segment (p) passing through element of layer A₁,p₂) The entry point is the intersection point obtained by entering the element, as shown in fig. 7 (a). On the contrary, if

In that

In the counterclockwise direction of (p), description₃,p₄) Go out from the A1 element, and the intersection point is the out point.

In addition, if the intersections are stored in the output matrix corresponding to the two-dimensional thread, a space is extremely wasted because the number of the obtained intersections is generally much smaller than the number of the input line segments, and the output matrix is sparse. To solve this problem, we will use a compact array to store the intersection coordinates and their attribute information.

The method comprises the steps of distributing the size of an intersection point array by calculating the residual space of a video memory, storing the intersection point array by using a device _ vector structure of thread, wherein the vector is equivalent to a heap structure, when an intersection point is found, the intersection point is placed at the top end of the heap, and simultaneously, the mutual exclusion of simultaneous operation of multiple threads on the structure and the consistency of the increase of the intersection point are ensured by using atomic operation. In order to further improve the parallelism of intersection point writing, the one-dimensional structure is changed into a two-dimensional array to store intersection point results. When the current thread writes in the intersection point, the current thread is automatically adapted to the corresponding line according to the threadlike threadidx.y, and only the inside of the line needs to be locked when writing in. Parallel writing can be achieved from row to row, that is, when the readidx.y of one thread and the other thread are mapped into different rows, they can operate on the respective rows simultaneously without waiting for each other. As in fig. 9, exclusive-or operations are required within rows of the same color, and rows of different colors can be written in parallel.

If two line segments do not intersect, there are three cases: the superposed elements are completely outside the superposed elements, the superposed elements are completely inside the superposed elements, and the elements intersect but do not intersect with the current two line segments. Here we need to consider the case where the elements are completely separated, because for the Intersect operation, the elements of layer a that are completely contained inside the overlay element also belong to part of the result. In order to avoid performing serial polygon inclusion test when the result is traced finally, we can use the multithreading mechanism of the GPU to perform semi-inclusion test on the non-intersecting line segments in the intersection part to achieve parallel detection whether the overlapped element is inside the overlapped element. The specific method is that a ray method is adopted, the middle point of the line segment of the superposed element A is selected, a ray is made downwards along the Y direction, and if the ray is intersected with the current line segment of the element B, marking is carried out. When the whole block finishes the current task, the line segment marked by odd times in the A is indicated inside the element B, otherwise, the line segment is outside the element B. If all line segments of a certain element in a are marked odd number of times, it can indicate that the element is inside B. The processing can be accelerated by using a shared memory of the thread block, in order to ensure that block conflict access is not caused, the number of the shared memory to be opened up is the same as the number of threads of the block, each thread is ensured to have a corresponding shared memory position, and the initialization is 1. As shown in fig. 9, when thread t detects the midpoint of segment a1 to ray down through b1, it is set to-1 at the location of shared memory threadidx. After block calculation is finished, multiplying the identifiers of the same columns where the thread groups are located by the thread groups with the thread groups of 0, and the access mode not only ensures that block conflict does not exist, but also ensures that each line segment of the element in the A is subjected to inclusion test.

The step 300 includes the following steps:

(301) intersection array ordering process

Because the intersection point coordinates and the related attribute information thereof are stored in parallel by adopting a line-dividing structure, and the execution of the thread is irregular, the adjacent positions of the intersection points after parallel calculation do not have any correlation. Before tracing the superposition ring, the intersection points need to be rearranged to present a certain regularity, so that a new superposition result is generated according to the regular trend of the intersection points. Firstly, splicing and summarizing the intersection point array and the attribute information thereof respectively, and completing the splicing and summarizing in parallel by using copy and other operations of thread. And then, the intersection points need to be respectively inserted into the original coordinate arrays CA of the A layer and the B layer, so as to generate a new line segment.

The process of intersection insertion is specifically described by taking the layer B as an example. Firstly, the intersection points are ordered according to the B element coordinates, the ordering is completed in three steps, and the method of stable _ start _ by _ key of thread is mainly utilized to execute in parallel, wherein the info B is required to be used as the key. The first step is as follows: and sequencing according to the ring id of the B layer, realizing aggregation of intersection points of the same ring, wherein a customized comparison function in the sequencing method is shown in FIG. 10(a), and the intersection point array and the info B are sequenced in an ascending order at the same time according to the size of the ring component. The second step is that: for the intersection points with the same ring id, we need to sort according to the line segment id to ensure that the intersection points belonging to the same line segment are gathered together. As shown in fig. 10(b), the comparison function in the stable _ sort _ by _ key method is to perform parallel processing for the intersection points with the same loop id in the first sorting. The third step: for the intersection points on the same line segment, in order to conform to the trend of the line segment, the intersection points need to be sorted according to the distance from the intersection points to the starting point of the line segment. The comparison function of FIG. 10(c) further refines the key to ensure the correct order of multiple intersections on the same line segment.

At the CPU end, the insertion process of the intersection point to the original coordinate sequence of the layer B can be completed only by scanning the sequenced intersection points once. Meanwhile, the update operation of the array such as RA _ B, infoB needs to be maintained. When scanning the intersection point i, we need to judge whether it is on the same ring with the point i + 1. If not on the same ring, update operations are needed for those rings between which there is no intersection, since the intersection before i has been inserted into coordB, these rings are subjected to an incremental operation of i +1, i.e. RA _ B [ k ] + ═ i +1, where k e [ info [ i ] ring, info [ i +1] ring). If the two intersection points are on the same ring and on the same line segment at the same time, the number of the intersection points is accumulated, and the info [ i ] segment value is modified at the same time, because the insertion of the intersection point before i causes the line segment id pointed by the intersection point to change, and the line segment id needs to be updated again and points to the correct position. And then processing the next intersection point, further finding all the intersection points belonging to the current line segment, and inserting the intersection points into coordB at the corresponding positions. Thus, a new coordinate sequence based on the B layer ring coordinate information after intersection is formed.

For the layer A, the intersection points are scanned again in the same manner, except that the comparison function of the parallel ordering of the intersection points is based on the ring id, the line segment id and the distance of the layer A respectively. In addition, when scanning the sorted intersection points, non-intersecting ring ids are saved, because a is the superimposed layer, and a certain element in a may become a part of the superimposed result even though B does not Intersect, for the Intersect operation, the a element inside the B element is kept.

In the overall analysis, for the intersection point recombination, the intersection point sequencing can be executed in parallel at the equipment end by using the Thrust, the intersection point insertion is completed at the CPU end, only two scans are needed for the intersection point, A is performed once, B is performed once, and the time complexity is linear.

(302) Result loop tracking procedure

The tracking part of the superposition result is mainly completed in the CPU, and because the last sequencing of the intersection point array is specific to the layer A, the superposition result can be picked up by scanning the intersection points once by utilizing the sequencing, instead of scanning a new vertex array once every time a new ring is searched by the traditional method.

For the scanned intersection array, the starting point of the result ring is first recorded by whether the result array is empty. And judging whether the intersection point infoA and the info B are processed according to the positive and negative of the segment of the intersection point infoA and the info B, if the segment values are both larger than 0, taking the next intersection point, and otherwise, selecting a tracked array according to the in-out point attribute of the intersection point. If info B.segment is less than 0, the intersection point is a point entering the element of the layer A, and for the Intersect operation, we need to pick up along a new array of B; if info a.segment is less than 0, it means that the point is an intersection point away from the a layer element, and it is necessary to continue tracing along the new array of a.

In the tracking process, if the currently encountered point is not the intersection point, adding the currently encountered point into the result array, otherwise, judging according to the situation. Since the intersection is in element unit when the GPU is used for intersection, the intersection is recorded twice for the same line segment belonging to different elements, and we do not remove the repetition point in the intersection sorting part, so this situation needs to be considered when generating the result ring. We take tracking on B-ring as an example to illustrate the details of the processing. (1) If the encountered point is a duplicate of the current intersection and they belong to the same B-ring, the point may continue to advance along the current ring without processing. Since they must be continuous on the B-ring if they are repeated on it, whereas in a the two intersections are separated from different rings and the in-and-out direction is different, if one intersection is followed by B, then the other is followed by a, whose properties can be modified in the a-pick part. (2) If the current point is the same as the starting point of the result ring, which indicates that the result ring is tracked completely, the next new result ring can be tracked. (3) If the encountered point is an inflection point (a point away from the A-ring) and the inflection point has a repeat point, we select the correct exit point according to the id attribute of the current ring and the in-out attribute of the point, and move to the A-ring to continue tracing.

The idea of picking up the resulting points in the a-ring is similar to the steps described above. After the result points in the A ring and the B ring are continuously tracked as above, the superposition result ring is generated after all the intersection points are processed.

In order to illustrate the beneficial effects of the present invention, it is necessary to compare the fast parallel spatial stacking method proposed by the present invention with the conventional spatial stacking method. The experiment of the part mainly comprises three aspects, namely firstly, the correctness of the algorithm is verified, secondly, the efficiency test is respectively carried out by the method in the CPU and the method in the GPU, and the advantage of the algorithm in the aspect of processing intensive data is analyzed. And finally, comprehensively evaluating the method from the aspects of data reasonability and algorithm parallelism through test analysis of all steps in the algorithm.

The experimental environment is shown in the following table:

TABLE 1 Environment configuration

In the experiment, 11 surface layers are selected as input data, any two of the 11 surface layers are selected to form a group as a superposed layer and a superposed layer, and 9 representative superposition analysis tests are performed in total to show the universality of the experiment.

TABLE 2 Experimental data

Data set	Number of elements	Number of rings	Number of coordinates	Data size
					D1	2525	2770	229489	3.63MB
D2
		1	2	2513	39.4KB
D3
		3	4	6908	108KB
D4
		1	2	3610	56.5KB
D5						2525	2770	229489	3.63MB
	D6	11	12	28238	441KB
D7						1104	1105	67719	1.09MB
	D8
		27	38	11687	184KB
	D9					309	3263	37377	612KB
D10		183	3794	117937	3.33MB
	D11					2493	2519	3748046	57.3MB

(1) Algorithm correctness verification

In order to verify the correctness of the algorithm, a superposition analysis tool of ArcGIS is selected for comparison, and difference analysis is performed on the visualization effect, the number of intersection points and the number of superposition result elements by using an Intersect superposition analysis method of ArcGIS. The superimposed layer selected in the experiment is D3, the superimposed layer is D6, and the comparison table of the superimposed effects of the two methods is shown in table 4 below:

TABLE 3 comparison of the effects of superposition

From the comparison, it can be found that the number of elements of the calculation result of the method is the same as that of the ArcGIS result, and therefore the accuracy can be guaranteed.

(2) Comparison of algorithm efficiency

1) In contrast to methods in CPU

In the experiments in this section, we selected the interaction method in GEOS as a reference to compare the efficiency differences between the spatial overlay method in CPU and the method we implemented in GPU, and we performed nine sets of overlay analysis experiments Dx ∩ Dy according to different types, where Dx represents the overlaid layers and Dy represents the overlaid layers.

Another comparison method selected by us is a GPC clipping library, which realizes arbitrary clipping of irregular polygons and can handle the condition of holes. We mainly use the Clip method of GPC as another overlay implementation in CPU to compare with the method of the present invention implemented in GPU.

In order to ensure the fairness of the experiment, for a GEOS method and a GPC method, only the time of a superposition analysis part is recorded, the reading time of source data and the storage time of result data are not considered temporarily, and when the method in a CPU is realized, an R-tree index is established for a superposed layer through libspatialindex, and a SetFilter method is used for filtering, so that the superposition efficiency of the CPU method is improved. The method of the present invention also does not take into account the time of data read and write, but the time of data transfer between the CPU and GPU and the time of new result ring generation are to be recorded. The execution time comparison graphs of the two methods are shown in fig. 12, fig. 13 and fig. 14.

From the efficiency comparison graph and the acceleration ratio graph, it can be found that for the test of the nine groups of data, the efficiency of the method is 10-21 times of that of a GEOS method in a CPU, and the average acceleration ratio is 13.4X, and for a more common GPC clipping method, the acceleration ratio of the method realized by the GPU can also reach 2-7 times and is 3.6X on average. The reason for analyzing the higher efficiency of the method of the invention mainly comprises the following points: (1) in our method all the element coordinate data and its auxiliary structure are stored in memory and do not need to be dynamically allocated. In contrast, the GEOS design takes the problem of memory limitation into consideration, so that the structure design is based on the premise that data is stored in a disk, and dynamic call allocation is performed when needed. The complex design is necessary for the previous generation of computers with limited memory, but now with the development of computers, the performance of each hardware is continuously improved, the memory space is no longer a factor for restricting programs, and obviously, the design is very inefficient. In addition, the open source library uses a large amount of dynamic memories and pointers, which all cause cache failure and affect the execution efficiency. (2) In the GPU implementation, grid space indexes suitable for processing in the GPU are built for the superposed layers, and then parallel rough filtering is performed by using MBR, so that the filtering efficiency and the filtering precision can be guaranteed. In the serial CPU implementation, each element of the superposed layer needs to be subjected to query test independently, even if a spatial index is established for the superposed layer, the polygon of each superposed layer needs to be queried from the root of the index tree until the nearest leaf node is found, and the existence of redundant paths and low filtering precision affect the processing efficiency of subsequent superposition. (3) The GPU has good advantages for floating point number operation, while in intersection solving among polygon elements, cross product operation uses a large number of floating point number operation, the calculation advantages of the GPU are fully utilized, on the other hand, the parallelism degree in the method is high, and multiple stages in the whole algorithm implementation can be processed in parallel by the GPU, including filtering intersected candidate pairs by parallel grid indexes, accurately solving intersection among elements in parallel, ordering intersection points and the like.

2) In contrast to the method in the GPU

To further verify the efficiency of the algorithm of the present invention, we also chose some leading edge academic studies in this respect to compare with the method of the present invention. The GROA method is a research result published in 2011 on ACM SIGSPATIAL GIS on an acceleration method of spatial overlay analysis on a GPU, and by taking this as reference, compared with performance differences of different overlay methods under a GPU architecture, since a prototype of the GROA method is realized on an Nvidia Tesla series video card, for fairness, we tested this method on a GTX260 device, where the efficiency of the two methods in parallel operation of an intersection part is as shown in fig. 15.

Compared with the intersection calculation efficiency of the GROA method, the intersection calculation efficiency of the fourth group of data and the sixth group of data is higher than that of the GROA method, and after the characteristics of each group of data are analyzed, the fourth group of data is processed by the superposition of 2770 rings and 2770 rings. For the first, second and third groups of test data, the loop number of each block is increased due to the fewer number of rings of the superimposed layers, and the efficiency is not as good as that of GROA. On the other hand, for the sixth group of test data, it is found that for the D10 data, the average number of coordinates contained in each ring can be analyzed to be 31 according to the number of rings and the total number of coordinates, so for the processing within a block, Loop operations of threads are less, and the threads of the entire block can be substantially fully utilized, because within one block, the task division manner adopted by us is Threadnum (32,16), which does not cause the waste of the threads within the block, and thus the parallelism is higher. For D9, the average coordinate of each ring is 11, and the intra-block threads are relatively inefficient and cannot take advantage of many cores.

In conclusion, the method has a good acceleration effect on two types of data, and when the superposed layer and the superposed layer contain more elements, the higher parallel efficiency is achieved by increasing the number of blocks; the other type is that when the average coordinate number of each element of the layer is relatively consistent with the thread utilization of a single block, the parallel efficiency is improved by fewer threads Loop in the block and higher thread participation degree is kept.

In addition, in the scalability of the algorithm, because the GROA method maps the superimposed layers and the superimposed layers into the GPU in an integral manner, the waste of non-intersecting element threads is serious, and when the input data is large enough and cannot be transmitted to the video memory at one time, the method is not feasible, so the scalability of the algorithm is poor. In the method, only the candidate superposition pairs are mapped into the GPU during parallel processing, and the large data can be processed in a blocking mode through the PLA structure, so that the algorithm has better expandability.

The parallel spatial superposition analysis method of the invention utilizes the improved PLA of the surface element structure to respectively realize high-precision parallel spatial filtering based on the parallel grid index technology at the equipment end, and maps the candidate set to the GPU thread array in a fine-grained manner to realize the refining processing of the parallel elements, thereby fully playing the many-core advantages of the GPU and improving the analysis efficiency of spatial superposition; 2. the rapid implementation method for space superposition analysis provided by the invention can improve the performance of the algorithm by one order of magnitude compared with the CPU implementation on the surface-surface superposition analysis for processing intensive data, and can be effectively applied to scenes with higher real-time requirements; 3. due to the advantages of the space superposition analysis-oriented quick implementation method in the space superposition analysis efficiency, a good exploration direction is provided for the performance improvement of the space analysis in the network map service mode. The computational power of the overlay analysis can be greatly improved by only utilizing the advantages of the graphics processor in a low-end personal computer, and the overlay analysis is not a better solution compared with a high-end server.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A quick implementation method for spatial superposition analysis is characterized by comprising the following steps:

and 300, tracking the result according to the space connection type Intersect.

2. The method of claim 1, wherein step 100 comprises the steps of:

(101) data structure of design surface layer element

On the basis of the coordinate queue, a PLA structure is designed, wherein the PLA structure comprises three auxiliary queues for recording other information of a surface layer, and the three auxiliary queues are respectively as follows: FA: element auxiliary queue in charge of storing pairsThe index of the corresponding ring; RA: a ring auxiliary queue in charge of storing indexes of corresponding coordinates; CA: the specific coordinate point queue is responsible for storing X and Y floating point coordinates, if the existence of a space layer omega is assumed to contain n space elements, the index number of the elements is represented by lambda, and the index number of the elements is represented by lambda

Index number representing the ring, index number representing the coordinate by κ;

(102) establishing a grid index for the superimposed layer A

(103) Generating candidate pairs using parallel trellis indices

For the superimposed layer A, firstly storing a pre-generated parallel grid index into an equipment display memory of a GPU (graphics processing unit), for the superimposed layer B, selecting outsourcing of each surface element of the superimposed layer to perform parallel spatial filtering, and ensuring that each thread block processes a filtering task of one surface element on the basis of a distribution strategy of the thread blocks, wherein each thread in each thread block is responsible for each grid unit in the grid index covered in the outsourcing range of the surface element;

and simplifying the filtering result in a key-value form, wherein the element id of the layer B is used as a key, the element id of the layer A is used as a value, the candidate set is sorted according to the key by using a parallel sorting method of Thrust, repeated candidate pairs are further filtered in parallel, and the generation of the input set to be subjected to intersection operation is finished.

3. The method of claim 2, wherein the PLA structure is then defined as:

Denotes, that is, FA [ lambda ]]The position of the last ring contained in the current element in the RA queue is stored, and the position of the starting ring contained in the element lambda in RA is FA [ lambda-1 ]]+1, the number of rings included in this element is thus FA [ lambda ]]-FA[λ-1]For a ring, the same principle applies

The coordinates of which contained in the queue CA are composed of

It is shown that,

Including the position of the start coordinate in the CA as

The ring comprises the number of coordinates of

Thereby obtaining the mapping relation between the element queue FA and the coordinate queue CA, that is, the coordinate set included by the element λ is:

and can determine the number of coordinates included in the element as RA [ FA [ lambda ]]]-RA[FA[λ-1]]。

4. The method of claim 2, wherein step 102 specifically comprises:

The element rendering process is divided into three steps, wherein in the first step, outline information of the vector elements is extracted; secondly, scanning the outline and calculating a filling grid area by using a grid controller; and thirdly, performing cross-section or cross-unit filling on the filling area in a horizontal scanning mode, performing an index analysis process, acquiring index information through the input vector elements, mapping the index information into corresponding grid units through an index conversion engine, and finally generating a grid index through a rendering cache together with the vector element rasterization result.

5. The method of claim 2, wherein step 200 comprises the steps of:

(201) parallel intersection task partitioning

Before parallel intersection, firstly determining the distribution condition of thread blocks, and for the specification of the number of the thread blocks, firstly determining the number of the thread blocks according to the number of candidate pairs; after the threads are distributed, when kernel parallel intersection is executed, a current thread block firstly finds a candidate pair to be processed according to built-in variables, then finds specific element coordinates corresponding to the candidate pair by using a PLA structure of each layer which is pre-generated and stored in a display memory according to a superposition element id and a superposed element id related to the candidate pair, finally uses the coordinates as row vectors and the superposed elements as a rule of column vectors to organize a two-dimensional line segment matrix, a thread in the thread block determines line segment indexes of the superposed elements to be processed according to built-in variables threadIdx.x in the thread block, namely line numbers in the line segment matrix, and similarly determines line segment indexes of the superposed elements according to built-in variables threadIdx.y, namely column numbers in the line segment matrix, and when the built-in variable range is exceeded, a circular processing mode is adopted to ensure the maximum utilization rate of the thread;

(202) parallel intersection calculation at equipment end

Firstly, judging the intersection of the line segment of the superposed layer and the line segment of the superposed layer, in the aspect of specific intersection calculation, firstly, judging the intersection according to the external rectangle of the line segments through a rejection experiment, then respectively carrying out intersection and non-intersection processing, and judging the direction according to a cross product method if the two line segments are intersected;

then, calculating coordinates of line segment intersection points and attribute information thereof, and when a calculation result is stored, reserving some attribute information for each intersection point in addition to specific space coordinates of the intersection points to prepare for tracking and superposing results, wherein the attribute information comprises a ring id where the intersection point is located, a line segment id and a distance from a starting point; a compact array is also adopted to store intersection point coordinates and attribute information thereof;

then writing the intersection points into a result array in parallel; the one-dimensional result array is improved to form a two-dimensional structure, when the current thread is at a write intersection point, the one-dimensional result array is automatically adapted to a corresponding line according to the threadlike headidx.y, and then only data write in the line is locked, parallel write can be realized between lines, that is, when the headidx.y of one thread and the other thread are mapped to different lines, the threadlike headidx.y can operate respective lines simultaneously without waiting for each other;

finally, carrying out semi-inclusiveness test on the non-intersected line segments to detect whether the superposed elements are in the superposition elements in parallel; and respectively counting by adopting a ray method and a dividing and controlling strategy, and after the whole grid is calculated, if all line segments of a certain element in the A are marked for odd times, the element can be shown to be in the B.

6. The method of claim 5, wherein the non-intersecting line segments are tested for semi-inclusivity to detect in parallel whether the superimposed element is within the superimposed element in a particular process using shared memory of the thread block to speed up processing, the shared memory ensuring that each thread has its corresponding shared memory location.

7. The method as claimed in claim 5, wherein in step 202, when the compact array is used to store the intersection coordinates and the attribute information thereof, the size of the intersection array is allocated by calculating the remaining space of the video memory, and the property of the device _ vector structure is used, when an intersection is found, the intersection is placed at the top of the heap, and the simultaneous operation mutual exclusion and intersection increasing consistency of the multithread on the structure are ensured by using atomic operation.

8. The method of claim 5, wherein the step 300 comprises the steps of:

(301) intersection array ordering process

Firstly, respectively splicing and summarizing the intersection point array and the attribute information thereof, wherein the replication operation of the thread is utilized to be completed in parallel;

then, inserting the intersection points into the original coordinate arrays CA of the layer A and the layer B respectively to generate new line segments, finishing the sequencing by 3 steps, respectively taking the ring id, the line segment id and the distance to the starting point as keys, executing layer by layer according to a key-value form in parallel, and updating attribute information simultaneously; the recombination operation of the whole intersection point, the sequencing part utilizes Thrust to execute in parallel at the equipment end, the insertion of the intersection point is completed at the CPU end, and only two times of scanning is carried out on the intersection point.

(302) Result loop tracking

And utilizing the sequencing result to realize the picking of the superposition result by scanning the intersection point once.

9. The method of claim 8, wherein in the result ring tracking, for the scanned intersection point array, firstly, whether the result array is empty is used to record the starting point of the result ring, then whether the intersection point array is processed is judged according to the positive and negative of the segment of the intersection point, if the segment values are all greater than 0, the next intersection point is taken, otherwise, the tracked array is selected according to the in-out point attribute.

10. The method of claim 9, wherein in the result loop tracking process, if the currently encountered point is not an intersection point, the currently encountered point is added to the result array, otherwise, the judgment needs to be made in different cases: (1) if the encountered points are repeated points of the current intersection point and belong to a ring B 'of the same layer AB, the points can continue to advance along the current ring without processing, if the points pass through one intersection point and then move along the ring B', then the points pass through the other intersection point and then move along the ring A 'of the layer A, and the attributes of the points are changed in a picking part of the ring A'; (2) if the current point is the same as the initial point of the result ring, the result ring is tracked completely, and the next new result ring is tracked; (3) if the encountered point is an inflection point and the inflection point has a repeat point, selecting a correct exit point according to the id attribute of the current ring and the entrance and exit attribute of the point, entering the ring A ' to continue tracking, continuously tracking result points in the ring A ' and the ring B ', and generating a superposition result ring after processing all the intersection points.