CN107784001A

CN107784001A - Parallel spatial querying method based on CUDA

Info

Publication number: CN107784001A
Application number: CN201610741535.3A
Authority: CN
Inventors: 赵艳伟; 杨雄军; 王晓光; 杨帆
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2016-08-26
Filing date: 2016-08-26
Publication date: 2018-03-09

Abstract

The present invention relates to a kind of parallel spatial querying method based on CUDA, belong to field of computer technology.Parallel spatial querying method designed by the present invention is had a clear superiority compared with general using the spacing query method of tree index.As long as the present invention, by the parallel extraction to grid index correspondence position, just can determine query resultses according to querying condition.Current invention incorporates two steps of the filtering of original method and refining, frequent transmission of the Candidate Set in video memory and main memory was both avoided, substantial amounts of calculating process, greatly improves search efficiency when it also avoid accurately matching.

Description

Parallel spatial querying method based on CUDA

Technical field

The present invention relates to field of computer technology, and in particular to a kind of parallel spatial querying method based on CUDA.

Background technology

An important component of the Combat Command System as Weapon system Equipment, be whole armament systems information processing in The heart, and commander understand situation of battlefield, carry out the important means of commanding and decision-making.And to be carried out in the hyperspace battlefield of complexity Commander's control, it is necessary to the extremely huge and numerous and diverse tactical information of processing.Wherein 60% is closely related with geographical background, especially out of office During war, commanding and decision-making is more prominent for geography information and its dependence of related geographical attribute, the direct shadow of geography information Ring the formulation of battle plan and implementing for tactical thinking.By the GIS functions of Combat Command System, battle field geographical space can be achieved Real time comprehensive processing and visual analyzing directly perceived, image represent enemy and we's situation, so as to aid in commander to carry out battleficld command decision-making.

In Combat Command System, the most commonly used GIS functions are exactly browsing and battlefield space querying, root for battlefield electronic map According to querying condition, commander can find the target data in certain geographical ranged space, such as the military mark point in battlefield, long-range Overlay area of unit etc..

Space querying is generally referred to as space geometry inquiry, that is to say, that the space asked according to given space querying The geometric attribute of scope and spatial object, find out all spatial entities goal sets for meeting condition.Sky based on Indexing Mechanism Between query process be broadly divided into filtration step and refinement step.Filtration step is responsible for utilizing various filtered boundaries according to spatial index (such as convex polygon, MBR (MBR)) carries out preliminary screening to space data sets, finds a series of spatial object candidates Set；Refinement step is then accurately matched by loading the geological information of candidate collection with querying condition, so as to obtain really Query Result.

With networking, the development of integrative operation pattern, Combat Command System needs Various types of data to be processed to become increasingly complex, Data volume is more and more huger, and only geographic information data is with regard to that can reach more than TB levels, and existing Combat Command System is in face of so intensive Calculating task when, traditional processing mode can not meet to apply to the real-time demands of large-scale calculations.When in face of magnanimity During the complex process tasks such as the inquiry and analysis of battlefield spatial data, data processing cycle usually requires control and is at hundreds of milliseconds To in several milliseconds, and traditional spacing query method based on tree index, when data are increasingly concentrated, Space Elements it is overlapping Degree also by more and more higher, it is too high it is overlapping necessarily cause redundant path occur during spatial retrieval, had a strong impact on retrieval effect Rate, existing GIS space queryings function can not meet the needs of accusing real-time.

From the point of view of the prospect of computer nowadays scientific development, multimachine assembly, distribution, parallel processing will turn into the big rule of reply The main trend of mould complicated calculations pattern, in view of the Military Application demand and application characteristic of command and control system, using parallel place Managing framework will turn into inevitable.GPGPU (General Purpose Graphic Processing Unit, at general-purpose computations figure Manage device) add as high-performance calculation and the important of scientific algorithm application by its Large-scale parallel computing ability and easy programming Fast means, and it is widely used in other general-purpose computations fields.GPU general-purpose computations are that complexity exceeds CPU resolution abilities and calculating The application of demand, there is provided a less energy consumption of use can but play the solution of higher performance.

At present, corresponding SDK SDK has been issued in main GPU hardware supply commercial city so that developer can Application and development is carried out using high-level programming language.Wherein, CUDA (the Compute Unified Device of NVIDIA companies issue Architecture, unified calculation equipment framework) most popular parallel framework can be turned into as the extension of C/C++ language One of.

Combat Command System under the real-time processing requirement of intensive calculating task and GPU general-purpose computations fast developments the back of the body , it is necessary to design a kind of new enquiring and optimizing method under scape.

The content of the invention

(1) technical problems to be solved

The technical problem to be solved in the present invention is：A kind of spacing query method how is designed, tree-shaped is based on to solve tradition Index progress space querying execution step is more, filtering screening is not careful, search efficiency caused by accurate matching primitives complexity is low Problem.

(2) technical scheme

In order to solve the above-mentioned technical problem, it is described the invention provides a kind of parallel spatial querying method based on CUDA Method comprises the following steps：

Step 100. reads vector element, according to element type and index ID, establishes the parallel grid suitable for GPU environment Lattice index；

Step 200. carries out tile decomposition to the grid index according to video memory space, and is accustomed to establishing according to space access Index replacement policy and carry out tile index replacement；

Step 300. realizes that parallel spatial is inquired about using the multi-thread mechanism in GPU and the tile of generation index.

Preferably, the step 100, comprises the following steps：

Step 110. input vector element information, read the point coordinates and corresponding vector element index letter of vector element Breath；

Step 120. carries out grid to vector element and rendered according to vector element information；Grid render process is divided into three steps, The first step, extract the profile information of vector element；Second step, scanning profile simultaneously calculate filling mesh region；3rd step, according to water The mode of simple scan carries out span or across unit filling to filling region；

Finally perform index parsing, index information obtained by the vector element of input, by index translation engine by its In grid cell corresponding to being mapped to, finally the result with vector element rasterizing generates grid index by rendering cache.

Preferably, the step 200, comprises the following steps：

Step 210. carries out tile segmentation, and filled according to video memory space size and GPU number of threads to grid index It is fitted in video memory；

Step 220. carries out tile index when the grid index in video memory can not meet inquiry needs, according to LFU strategies Replace, otherwise according to the next tile to be replaced of index accesses frequency selection purposes.

Preferably, the step 300, comprises the following steps：

The tile of generation is indexed from host side and is transferred to equipment end by step 310.；

Firstly, for given video memory size, the scope of index to be stored is determined, if assuming to include m row n row tiles, The indexed set that should be stored is combined intoI represent index is expert at, j represent index column, then to each tile it Between, stored by the way of Hilbert curves, finally for each tile inside, take row scan mode store；

Inquiry request condition Bbox is loaded into equipment end by step 320.；

Step 330. given thread block and number of threads, the parallel query kernel function at starting device end；

Wherein number of threads is specified in two dimensions, and carrys out computational threads number of blocks according to the MBR scopes of inquiry request；

Partial index in video memory is moved back to host side, host side by step 340. when the index of inquiry is not in video memory Corresponding index is transferred to equipment end；

Step 350. starts strategy according to thread, performs parallel query kernel function, extracts spatial index collection；

Index after screening is transmitted back to host side by step 360., will according to vector corresponding to the index output inquired Element.

Preferably, the step 350, comprises the following steps：

Step 351. merges access principles, parallel query indexed results collection using thread；

Strategy is started according to thread first, the locus of thread process is calculated by three layers of offset relationship, by this Locus calculates the index tile belonging to the locus, is entered according to H-code corresponding to the sequence of tile Hilbert curves One step determines skew of the index in video memory, and H-code represents Hilbert codings；

Step 352. is compressed using thread parallel and forerunner's summation, indexed results collection is carried out parallelly compressed and collected；

Effective index ID of extraction is ranked up using CUDPP parallel merging algorithm, to the index set after sequence Each thread is handled an element respectively, calculate the difference of the element and its forerunner and save as mark amount flag, according to Flag mark property, it is parallelly compressed using CUDPP compact progress to index set, before mark to be measured to non-zero key element ID Put, finally try to achieve final ID result sets using parallel Prefix sum methods.

(3) beneficial effect

The beneficial effects of the present invention are：

1. the present invention parallel grid querying method, as long as according to querying condition by grid index correspondence position and Row extraction, just can determine query resultses.This method incorporates two steps of filtering and refining of luv space querying method, both Candidate Set is avoided in video memory and the frequent transmission in main memory, substantial amounts of calculating process when it also avoid accurately matching；

2. the spacing query method proposed by the present invention based on grid index and GPGPU parallel architectures is applied to processing magnanimity Intensive line noodles spatial data, compared with the tree index method in CPU, can reach the improved efficiency of magnitude.For tradition Tree index querying method in CPU, data volume more big data is more intensive, and search efficiency is lower, and method proposed by the present invention Search efficiency do not influenceed by data complexity, only influenceed by query context and the ID set sizes inquired, its retrieve Time complexity is one-dimensional linear；

3. it is network due to the advantage of parallel spatial querying method in recall precision proposed by the present invention based on CUDA The performance boost of spatial retrieval under Map Services pattern provides good probing direction.

Brief description of the drawings

Fig. 1 is the parallel spatial querying method flow chart of steps based on CUDA of the embodiment of the present invention；

Fig. 2 is the step flow chart that grid index is established in the embodiment of the present invention；

Fig. 3 is the parallel grid index generating process of the embodiment of the present invention；

Fig. 4 is that the step flow chart that index transmits in equipment end and host side is realized in the embodiment of the present invention；

Fig. 5 is neighborhood tile scope schematic diagram in the embodiment of the present invention；

Fig. 6 is the step flow chart for realizing parallel spatial inquiry in the embodiment of the present invention in GPU based on grid index；

Fig. 7 is that storage mode schematic diagram of the tile in video memory is indexed in the embodiment of the present invention；

Fig. 8 is three layers of calculations of offset relation schematic diagram in the embodiment of the present invention；

Fig. 9 is irregular range query disposition schematic diagram in the embodiment of the present invention；Wherein a is two-value rasterizing mistake Journey, b are extraction process；

Figure 10 is to be extracted parallel in the present invention in the embodiment of the present invention and screen the step flow chart of indexed results；

Figure 11 is to merge access mode schematic diagram in the embodiment of the present invention；

Figure 12 is that the parallel redundancy that excludes indexes schematic diagram in the embodiment of the present invention；

Figure 13 is hollow inquiry mode comparison diagram of the embodiment of the present invention；Wherein a is Traditional Space query steps；B is this The space querying step of inventive method；

Figure 14 is query context ratio chart in the embodiment of the present invention；Wherein a is that 500 groups of scopes are looking into for D3 figure layers 1/100 Request is ask, b is the inquiry request that 500 groups of scopes are D3 figure layers 1/50；

Figure 15 is that the embodiment of the present invention inquires about comparing result figure with R* methods；When wherein a~f is respectively D1~D6 inquiries Between；

Figure 16 is the CUDA profiler performance analysis charts of the embodiment of the present invention；Wherein a analyzes for GPU holding times, b For GPU time percentage analysis.

Embodiment

To make the purpose of the present invention, content and advantage clearer, with reference to the accompanying drawings and examples, to the present invention's Embodiment is described in further detail.

The embodiment of the present invention proposes a kind of parallel spatial querying method based on CUDA, using a kind of grid index skill Art, effectively vector element and GPU (Graphics Processsing Unit, graphics processor) computation schema are combined Get up, by indexing the efficient storage Model Design exchanged in the tactic of host side and equipment end and in video memory, fully profit With GPU " many-core " mechanism, ensure that the task amount of each thread process is tried one's best uniformly and carries out grid computing intensive enough, most The methods of being summed afterwards using grid extraction, two value ratio to, parallelly compressed, parallel forerunner, quickly calculates accurate Query Result. Compared to the huge amount of calculation of " first outer packet filtering accurately matches again " of conventional method, the present invention has filtered non-phase to greatest extent Space Elements are closed, simplify calculation procedure, meanwhile, when Space Elements degree of overlapping is higher, the present invention can avoid conventional tree-type rope Redundant path problem caused by drawing inquiry, greatly improves computational efficiency, and the present invention proposes a kind of novelty for space querying Resolving ideas.

The technical problem that the present invention can solve the problem that includes：1. suitable for the space index structure of GPU computing environment；2. index The efficiency transmitted between CPU and GPU；3. improve thread utilization rate and simultaneously using the merging access mechanism of GPU multithreadings Line efficiency.

A kind of parallel spatial querying method based on CUDA of the present invention is discussed in detail with reference to above-mentioned target, Fig. 1 is this The step flow chart of a kind of parallel spatial querying method based on CUDA of inventive embodiments, as shown in figure 1, under this method includes Row step：

Step 100. reads vector element, according to element type and index ID, establishes the parallel grid suitable for GPU environment Index, efficient Indexing Mechanism is provided for the inquiry of follow-up parallel spatial.

In order to maximumlly utilize GPU multi-thread mechanism, the present invention is considered as lattice structure and establishes spatial index, and one Aspect is evenly dividing beneficial to calculating task, on the other hand can greatly be guaranteed at the thread number of blocks of state of activation.

Fig. 2 is the step flow chart that grid index method is established in the present invention, as shown in Fig. 2 the step 100, including The following steps：

Step 110. input vector element information, read the point coordinates and corresponding vector element index letter of vector element Breath.

First, input vector element information, the information include the point coordinates of composition vector element, and are led to by Coordinate Conversion Road carries out Coordinate Conversion, and vector element coordinate is accurate into sub-pixel precision.Subsequently into algorithm core.

Step 120. carries out grid to vector element and rendered according to vector element information.

The step is whole algorithm core, including two parts, and Part I (left part in Fig. 3) is vector element grid Change process, Part II (right part in Fig. 3) are resolving of the key element index with corresponding grid.

Grid render process is divided into three steps, the first step, extracts the profile information of vector element；Second step, utilize grid control Device processed, scanning profile simultaneously calculate filling mesh region；3rd step, in the way of horizontal sweep to filling region carry out span or Across unit filling.

Index resolving is finally performed, index information is obtained by the vector element of input, passes through index translation engine In grid cell corresponding to mapping that to, finally the result with vector element rasterizing generates grid rope by rendering cache Draw.

Step 200. carries out tile decomposition to grid index according to video memory space, and is accustomed to establishing according to space access and indexes Replacement policy carries out tile index and replaced.

Fig. 4 is that the step flow chart that index transmits in equipment end and host side is realized in the present invention, as shown in figure 4, described Step 200, comprise the following steps：

Step 210. carries out tile segmentation, and filled according to video memory space size and GPU number of threads to grid index It is fitted in video memory.

Due to the limited size in video memory space, therefore for the space figure layer key element under large scale, its grid index without Method is disposably fully loaded in video memory space.In order to solve this problem, it is necessary to which grid index is carried out into tile segmentation first. Using pre-generatmg and pre-assembled technology, in initialization, part tile is indexed according to the index space size distributed in video memory Pre-transmission is to equipment end.If effective video memory grid index can not be obtained in the range of inquiry request, tile is taken to replace Strategy is updated to index.

Due to the continuity of geographical space, user is likely to visit again its adjacent area when some region is inquired about Domain, therefore need to account for adjacent area when replacing and updating.If order index tile is TileIndex (i, j), wherein i Represent index to be expert at, j represents index column.Then the adjacent area of the tile uses Fig. 5 such as to represent, formal definitions are：

The index tile in video memory is accessed every time, and its correspondence position will be added up.If the index of current accessed TileIndex (i, j) then replaces plan not in video memory according to LFU (Least Frequently Used, be least commonly used) Slightly, access frequency is counted using counter Cuda PARALLEL MERGING/SORTING ALGORITHMs, removes the index tile at least used at first TileIndex(s,t).A step for wherein needing to consider more is, if TileIndex (s, t) ∈ Ω_i,j, then without for Change, according to the next tile to be replaced of index accesses frequency selection purposes.False code is as shown in table 1.

Table 1 indexes tile renewal process

Using the above-mentioned this grid index mechanism for being adapted to realize parallel spatial inquiry under GPU environment, realize parallel empty Between inquire about.This part mainly discusses the most frequently used Spatial Select Query of two classes：Bbox scopes selection inquiry and polygon selection are looked into Ask.Both inquiries are that polygon selection inquiry needs additionally again more calculates one in the difference of the parallel query of GPU equipment ends Step, i.e., by irregular inquiry polygonal gird, and in multithreads computing, each thread needs carry out once more Grid judges.Remaining total algorithm step can select inquiry to handle according to Bbox scopes.Therefore this part is first main introduces The specific algorithm of Bbox scopes selection inquiry, then the algorithm is extended, please so as to handle arbitrary polygon selection inquiry Ask.

Fig. 6 is the step flow chart for realizing parallel spatial inquiry in the present invention in GPU based on grid index, such as Fig. 6 institutes Show, the step 300, comprise the following steps：

The tile of generation is indexed from host side and is transferred to equipment end by step 310..

Characteristic in view of GPU multithreadings to video memory space access, the storage indexed in equipment end need to meet to merge to visit The principle asked, the memory module of the tile index in video memory use the method such as Fig. 7.

Firstly, for given video memory size, the scope of index to be stored is determined, if assuming to include m row n row tiles, The indexed set that should be stored is combined intoThen between each tile, being deposited by the way of Hilbert curves Storage, wherein original position of the tile specified in video memory space can pass through (H-code-1) * size fast positionings, H-code tables Show that Hilbert is encoded, size represents the size of a block.Finally for inside each tile, the mode for taking row to scan stores, And ensure that (W1, W2, W3, W4 are respectively a byte to line width, therefore line width is 64 words for (W1, W2, W3, W4) word length Section), the multiple for being 16.

2 points of considerations are based primarily upon using such method：Because adjacent tiles index has on the geographical space of reality Proximity, and Hilbert curves can be assembled adjacent index, thus greatly ensure that space between each tile The good locality of data so that continuous one section of space in video memory is able to access that in space querying, is laid to merge access Basis.In addition for the inside of each tile, the above method can cause 16 threads in Half-warp only by once transmitting Can be to complete access requests of the whole Half-warp to video memory.After above-mentioned steps are handled, the tile index in video memory space Both linear memory is realized, while also meets the principle for merging and accessing, is read so as to realize to merge, improves thread to video memory Access efficiency.

Inquiry request condition Bbox is loaded into equipment end by step 320..

Real space querying scope is existing in the form of geographical coordinate, it is therefore desirable to realizes inquiry request to grid rope The mapping process drawn.

Assuming that engineer's scale is Scale=1000, i.e., represented with 1 meter on map of distance 10000 in actual geographic space The distance of rice.Grid resolution ratio can regard the grid unit quantity of covering in inches as, be represented with Dots_per_inch, Relations I nches_per_meter=39.3701 be present additionally, due to rice and inch, therefore every meter of grid included can be calculated Element number：

Dots_per_meter=Dots_per_inch × Inches_per_meter (2)

And then there is following relation in grid unit with actual geographic distance：

Resolution=Scale ÷ Dots_per_meter

=Scale ÷ Dots_per_inch ÷ Inches_per_meter (3)

Wherein, Resolution represents resolution ratio.

For the true geographical coordinate of inquiry request, it is necessary first to calculate skew ground according to the geographical outsourcing of whole figure layer Manage coordinate.When the spatially constant translation of object detail or scaling occur for the figure layer, it is necessary to be carried out according to translating and scaling formula Reduction, and then its corresponding grid line row number on original grid index is tried to achieve according to Resolution.

Step 330. given thread block and number of threads, the parallel query kernel function at starting device end.

This process mainly calculates the number of threads in the thread number of blocks and each block that participate in parallel spatial inquiry.Wherein Number of threads is specified in two dimensions, and ensures that total number of threads is no more than 512 in each block, is set to TN (32,16).In order to ensure Each one grid cell of thread process according to the MBR scopes of inquiry request, it is necessary to carry out computational threads number of blocks.

Assuming that query context is Bbox (x1, y1, x2, y2), and a width of idwidth of each index tile, it is a height of idheight.The tile indexed set of so this scope covering is combined into ∪ Idx (s, t), wherein,

In order that participating in quantity covering ∪ Idx (s, the t) set of the thread block of parallel computation, pass through CUDA two-dimentional thread Built-in variable can further calculate the distribution BN (bnx, bny) of Grid thread blocks.Wherein,

Wherein, bnx represents the abscissa of thread block, and bny represents the ordinate of thread block；Tn.x represents thread Tread's Abscissa, tn.y represent thread Tread ordinate.

Partial index in video memory is moved back to host side, host side by step 340. when the index of inquiry is not in video memory Corresponding index is transferred to equipment end.Otherwise do not perform and the partial index in video memory is moved back to the corresponding rope of host side, host side Draw the step of being transferred to equipment end, directly perform step 350；

Step 350. starts strategy according to thread, performs parallel query kernel function, extracts spatial index collection.

Strategy is started according to BN (bnx, bny) and TN (32,16) thread, outlet can be calculated by three layers of offset relationship The locus of journey processing.First, using the built-in variables of threadIdx in the kernel function of equipment end parallel spatial inquiry, Deviation post of the current thread in thread block is determined, then further according to the built-in variograph of BlockDim, BlockIdx in Grid Calculate global position of the thread in whole Grid, finally according to Grid cover starting tile video memory index in skew come Further determine that the locus for the unit that current thread should be handled.The determination of locus is as shown in Figure 8.

If the locus needs to extract grid in inquiry outsourcing.Otherwise, current thread is not involved in transporting Calculate.Select to inquire about for Bbox scopes, the calculating of irrelevant thread can be filtered in a manner described；Select to inquire about for polygon, Need to carry out two-value rasterizing according to the outsourcing of querying condition, i.e., grid turns to 1 in the range of polygon, in outsourcing and outside polygon Portion's grid turns to 0.Thread need to carry out grid computing, such as Fig. 9 in extraction grid cell parallel with the querying condition of correspondence position It is shown.

Figure 10 is to be extracted parallel in the present invention and screen the step flow chart of indexed results.As shown in Figure 10, the step 350 comprise the steps of：

Step 351. merges access principles, parallel query indexed results collection using thread.

The specific extraction process of correspondence position grid cell in video memory：First according to BN's (bnx, bny) and TN (32,16) Thread starts strategy, and the locus of thread process can be calculated by three layers of offset relationship, is calculated by the locus Go out the index tile belonging to the locus, the rope is further determined that according to H-code corresponding to the sequence of tile Hilbert curves Draw the skew in video memory.Due to being stored inside tile using row scan sequence, therefore only need to be according to locus in tile The skew in portion obtains specific grid cell according to ranks order.

A key factor for influenceing CUDA performances is global memory access efficiency, due to being extracted in multithreading Need largely to read and write video memory during indexing units, thus introduce this method global memory are read into Capable optimization, this is also the reason for generation grid index uses nybble.When constructing grid index structure, using three bytes this The index data that kind storage mode can preserve for the spatial information of intermediate data amount is enough, but if being so If can not ensure the address align of thread accesses video memory in same half warp in the section that can merge access, therefore will Data are caused to be divided into 16 serial transmissions.As shown in figure 11, because CUDA supports the merging of 32bit word lengths to access, the first feelings Condition, (W1, W2, W3) word length of index can not be snapped in annexable section, cause the less efficient of thread accesses video memory.And the Two kinds of situations, although a byte more than in storage, just meet that global memory internal memories merge access module, Index aligns one by one with merging section, therefore the data of 16 thread accesses only need to be transmitted once in half warp, optimize Access efficiency.

Step 352. is compressed using thread parallel and forerunner's summation, and indexed results collection is simplified and collected parallel.

The grid index unit of multiple thread parallel extractions has just obtained the index ID of correspondence position by de-parsing.For Vector face key element, because it has area of space, therefore the adjacent several or several grid cells of its grid index have seemingly Right property.The multi-threaded parallel inquiry degree of coupling is small, but can not ensure to index the uniqueness of ID set, if the whole inquired collected Close slave unit end and be transmitted back to host side, its efficiency of transmission will be influenceed.For this reason, it may be necessary to Query Result is carried out using parallel algorithm Aggregation process, i.e., only the set that key element index differs is transmitted.

The specific practice taken is：Effective index ID of extraction is ranked up using CUDPP parallel merging algorithm, it is right Index set after sequence makes each thread handle an element respectively, calculates the difference of the element and its forerunner and saves as mark Knowledge amount flag.So, in flag be 0 element position, it is corresponding index set in vector element ID must not repeat 's.On the contrary, illustrating to be redundancy value in flag for index ID corresponding to one of 0 or continuous one section of element, can be neglected.Cause This, it is parallelly compressed using CUDPP compact progress to index set according to flag mark property, mark is measured into non-zero want Plain ID is preposition, finally tries to achieve final ID result sets using parallel Prefix sum methods, and the partial set only is passed back into main frame In the internal memory at end, so parallel query efficiency can be improved to greatly reduce transmission quantity.As shown in figure 12.

Parallel spatial querying method designed by the present invention is as shown in figure 13 compared with the difference that General Spatial is inquired about.The first feelings Condition is the step space querying of tradition two using tree index, i.e., according to querying condition W MBR, first carries out spatial filtering, filter out Candidate collection R1 and R2.Again by refinement step, Space Elements are accurately detected, it is R1 to find Query Result.Second Situation is parallel grid querying method, as long as just can according to querying condition W by the parallel extraction to grid index correspondence position Enough determine query resultses.This method incorporates two steps of filtering and refining of original method, has both avoided Candidate Set in video memory With the frequent transmission in main memory, substantial amounts of calculating process when it also avoid accurately matching.

Beneficial effect to illustrate the invention, it is necessary to by fast parallel spacing query method proposed by the present invention and tradition Spacing query method carries out Experimental comparison.Experimental section is broadly divided into two parts, and Part I mainly will be of the present invention Parallel spatial querying method based on GPGPU contrasts with the space querying in traditional CPU, including query time and spatial filtering Degree.The kernel performances of Part II Main Analysis this method, so as to verify the correctness and high efficiency of the present invention.

(1) contrasted with the method in CPU

The target of this part Experiment is the spacing query method in CPU of the test present invention with using R* spatial indexs, Spatial retrieval time and spatial filtering degree in the case of identical benchmark test sets and same queries request.

The test data details of table 2

Data set	Feature	Type	Key element number	Size
					D1	It is intensive	Point figure layer	1466808	50.3MB
D2	It is sparse	Point figure layer	405	11.1KB
					D3	It is intensive	Line chart layer	1631927	365.MB
D4	It is sparse	Line chart layer	979	142KB
					D5	It is intensive	Face figure layer	3233	62.2MB
D6	It is sparse	Face figure layer	123	1.48MB

Test set is primarily directed to the different types of geographic element of 3 classes, and per class two groups of different characteristics of select factors Data, one group is the sparse situation of element distribution, i.e. the minimum outsourcing of key element overlaps less；Another group is the intensive feelings of element distribution Condition, the details of the benchmark test sets of selection are as shown in table 2.

6 groups of data for more than, are respectively adopted different space querying conditions and carry out spatial filtering to it.To ensure to inquire about Fairness and uniformity, space querying request each tests figure layer according to different query contexts by generating at random 8 groups of experiments (every group includes 500 inquiry requests) are carried out, query context is then true by being carried out with the proportionate relationship of whole figure layer It is fixed, such as original figure layer scope 1/120 is used as querying condition.Figure 14 is two group polling range scale figures.Wherein (a) is 500 Group scope is the inquiry request of D3 figure layers 1/100；(b) for 500 groups of scopes be D3 figure layers 1/50 inquiry request.

It can be found from Figure 15 test result：For point diagram sheaf space data, the method that the present invention uses can not show Go out good advantage, main cause have it is following some：First, the dense degree regardless of space point data, its key element outsourcing Gland degree and little, thus caused overlay path is less, therefore the space querying efficiency based on R* tree index is higher. Secondly, parallel spatial inquiry of the present invention based on GPGPU, due to being related to the calculating of equipment end, it is therefore necessary to consider that data pass The defeated performance brought restricts, for dense D1 data, although the calculating time of the inventive method is few compared with R* methods, if from whole The body time considers, but suitable with using the space querying time of R* indexes in CPU.

For sparse data as line chart layer D4 and face figure layer D6, the advantage of this method is also difficult to bring into play. Reason is that Sparse causes MBR overlapping less, therefore R* method efficiency has been lifted, and because data volume is smaller, GPU's Calculating advantage does not find full expression.

But for the intensive line chart layer of magnanimity and face figure layer, such as D3 and D5.This method is in space querying time side Face has absolute predominance compared with the space querying based on R* trees index in CPU.Because data volume is larger, the height of tree index Degree will sharply increase, and due to data-intensive, cause the MBR degrees of overlapping for indexing intermediate node higher, therefore tree index is being located Manage less efficient on this kind of spatial data.As can be seen that being put down for every group of data using the query time of R* indexes from Figure 15 c Exceed 1600 milliseconds, and method proposed by the present invention can also be controlled on 50 milliseconds of left sides even if the transmission time for calculating data The right side, now data transmission period can be neglected compared with the calculating time in CPU, therefore locate in method proposed by the present invention 32 times of speed-up ratio can be obtained in reason this kind of data of D3.Although likewise, as can be seen that the number of D5 faces key element from Figure 15 e Mesh does not have that D3 is more, but it also has higher MBR degrees of overlapping, averagely needs 500 milliseconds or so using R* indexes, and the present invention carries The method gone out needs 50 milliseconds or so, and its speed-up ratio can also reach 10 times or so.

In search efficiency comparative analysis more than to different type and different characteristics spatial data, we can carry out as follows Summarize：It is intensive that spacing query method proposed by the present invention based on grid index and GPGPU parallel architectures is applied to processing magnanimity Line noodles spatial data, compared with the tree index method in CPU, the improved efficiency of magnitude can be reached.For in traditional CPU Tree index querying method, data volume more big data is more intensive, and search efficiency is lower, and the inquiry of method proposed by the present invention Efficiency is not influenceed by data complexity, is only influenceed by query context and the ID set sizes inquired.

(2) kernel performance evaluations

This part is mainly in the inside implementation procedure of the present invention, and the time ratio shared by various pieces is tested, meaning Performance impact of the data transfer to whole algorithm between analysis host side and equipment end.The analysis tool that we use is CUDA Profiler, it is a kind of visualization tool that NVIDIA companies provide in CUDA toolkit, wherein containing a series of The timetable of CPU and GPU crawler behavior in application program can be reflected, allow user by it is a kind of more intuitively in a manner of pair GPU program is analyzed, while CUDA profiler are additionally provided and automatically analyzed engine to help user to determine performance bottle rapidly Neck position, so as to find optimizable strategy.

As shown in figure 16, by Summary Plot charts, we can intuitively see kernel functions and memory The percentage that copy takes in total GPU time.It is seen that the calculating section of method window query proposed by the present invention accounts for The 76.21% of whole kernel times, and data copy has only accounted for 20% or so, therefore it may be concluded that the present invention carries TCP data segment does not cause performance bottleneck in the method gone out.

The beneficial effects of the present invention are：

Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these are improved and deformation Also it should be regarded as protection scope of the present invention.

Claims

1. a kind of parallel spatial querying method based on CUDA, it is characterised in that methods described comprises the following steps：

Step 100. reads vector element, according to element type and index ID, establishes the parallel grid rope suitable for GPU environment Draw；

Step 200. carries out tile decomposition to the grid index according to video memory space, and is accustomed to establishing according to space access and indexes Replacement policy carries out tile index and replaced；

2. the parallel spatial querying method according to claim 1 based on CUDA, it is characterised in that the step 100, bag Include the following steps：

Step 110. input vector element information, read the point coordinates of vector element and corresponding vector element index information；

Step 120. carries out grid to vector element and rendered according to vector element information；Grid render process is divided into three steps, and first Step, extract the profile information of vector element；Second step, scanning profile simultaneously calculate filling mesh region；3rd step, sweeps according to level The mode retouched carries out span or across unit filling to filling region；

Index parsing is finally performed, index information is obtained by the vector element of input, mapped by index translation engine Into corresponding grid cell, finally the result with vector element rasterizing generates grid index by rendering cache.

3. the parallel spatial querying method according to claim 2 based on CUDA, it is characterised in that the step 200, bag Include the following steps：

Step 210. carries out tile segmentation, and be assembled into according to video memory space size and GPU number of threads to grid index In video memory；

Step 220. carries out tile index when the grid index in video memory can not meet inquiry needs, according to LFU strategies and replaced, Otherwise according to the next tile to be replaced of index accesses frequency selection purposes.

4. the parallel spatial querying method according to claim 3 based on CUDA, it is characterised in that the step 300, bag Include the following steps：

Firstly, for given video memory size, the scope of index to be stored is determined, if assuming to include m row n row tiles, should be deposited The indexed set of storage is combined intoI represents index and is expert at, and j represents index column, then between each tile, adopting Stored with the mode of Hilbert curves, finally for each tile inside, take row scan mode store；

Inquiry request condition Bbox is loaded into equipment end by step 320.；

Partial index in video memory is moved back to host side, host side is corresponding by step 340. when the index of inquiry is not in video memory Index be transferred to equipment end；

Index after screening is transmitted back to host side by step 360., according to vector element corresponding to the index output inquired.

5. the method for parallel extraction spatial index set according to claim 4, it is characterised in that the step 350, bag Include the following steps：

Strategy is started according to thread first, the locus of thread process is calculated by three layers of offset relationship, passes through the space Position calculates the index tile belonging to the locus, further according to H-code corresponding to the sequence of tile Hilbert curves Skew of the index in video memory is determined, H-code represents Hilbert codings；

Effective index ID of extraction is ranked up using CUDPP parallel merging algorithm, the index set after sequence is made often Individual thread handles an element respectively, calculates the difference of the element and its forerunner and saves as mark amount flag, according to flag's Property is identified, it is parallelly compressed using CUDPP compact progress to index set, it is preposition that mark is measured into non-zero key element ID, finally Final ID result sets are tried to achieve using parallel Prefix sum methods.