CN107784001A - Parallel spatial querying method based on CUDA - Google Patents
Parallel spatial querying method based on CUDA Download PDFInfo
- Publication number
- CN107784001A CN107784001A CN201610741535.3A CN201610741535A CN107784001A CN 107784001 A CN107784001 A CN 107784001A CN 201610741535 A CN201610741535 A CN 201610741535A CN 107784001 A CN107784001 A CN 107784001A
- Authority
- CN
- China
- Prior art keywords
- index
- parallel
- tile
- grid
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Processing Or Creating Images (AREA)
Abstract
The present invention relates to a kind of parallel spatial querying method based on CUDA, belong to field of computer technology.Parallel spatial querying method designed by the present invention is had a clear superiority compared with general using the spacing query method of tree index.As long as the present invention, by the parallel extraction to grid index correspondence position, just can determine query resultses according to querying condition.Current invention incorporates two steps of the filtering of original method and refining, frequent transmission of the Candidate Set in video memory and main memory was both avoided, substantial amounts of calculating process, greatly improves search efficiency when it also avoid accurately matching.
Description
Technical field
The present invention relates to field of computer technology, and in particular to a kind of parallel spatial querying method based on CUDA.
Background technology
An important component of the Combat Command System as Weapon system Equipment, be whole armament systems information processing in
The heart, and commander understand situation of battlefield, carry out the important means of commanding and decision-making.And to be carried out in the hyperspace battlefield of complexity
Commander's control, it is necessary to the extremely huge and numerous and diverse tactical information of processing.Wherein 60% is closely related with geographical background, especially out of office
During war, commanding and decision-making is more prominent for geography information and its dependence of related geographical attribute, the direct shadow of geography information
Ring the formulation of battle plan and implementing for tactical thinking.By the GIS functions of Combat Command System, battle field geographical space can be achieved
Real time comprehensive processing and visual analyzing directly perceived, image represent enemy and we's situation, so as to aid in commander to carry out battleficld command decision-making.
In Combat Command System, the most commonly used GIS functions are exactly browsing and battlefield space querying, root for battlefield electronic map
According to querying condition, commander can find the target data in certain geographical ranged space, such as the military mark point in battlefield, long-range
Overlay area of unit etc..
Space querying is generally referred to as space geometry inquiry, that is to say, that the space asked according to given space querying
The geometric attribute of scope and spatial object, find out all spatial entities goal sets for meeting condition.Sky based on Indexing Mechanism
Between query process be broadly divided into filtration step and refinement step.Filtration step is responsible for utilizing various filtered boundaries according to spatial index
(such as convex polygon, MBR (MBR)) carries out preliminary screening to space data sets, finds a series of spatial object candidates
Set;Refinement step is then accurately matched by loading the geological information of candidate collection with querying condition, so as to obtain really
Query Result.
With networking, the development of integrative operation pattern, Combat Command System needs Various types of data to be processed to become increasingly complex,
Data volume is more and more huger, and only geographic information data is with regard to that can reach more than TB levels, and existing Combat Command System is in face of so intensive
Calculating task when, traditional processing mode can not meet to apply to the real-time demands of large-scale calculations.When in face of magnanimity
During the complex process tasks such as the inquiry and analysis of battlefield spatial data, data processing cycle usually requires control and is at hundreds of milliseconds
To in several milliseconds, and traditional spacing query method based on tree index, when data are increasingly concentrated, Space Elements it is overlapping
Degree also by more and more higher, it is too high it is overlapping necessarily cause redundant path occur during spatial retrieval, had a strong impact on retrieval effect
Rate, existing GIS space queryings function can not meet the needs of accusing real-time.
From the point of view of the prospect of computer nowadays scientific development, multimachine assembly, distribution, parallel processing will turn into the big rule of reply
The main trend of mould complicated calculations pattern, in view of the Military Application demand and application characteristic of command and control system, using parallel place
Managing framework will turn into inevitable.GPGPU (General Purpose Graphic Processing Unit, at general-purpose computations figure
Manage device) add as high-performance calculation and the important of scientific algorithm application by its Large-scale parallel computing ability and easy programming
Fast means, and it is widely used in other general-purpose computations fields.GPU general-purpose computations are that complexity exceeds CPU resolution abilities and calculating
The application of demand, there is provided a less energy consumption of use can but play the solution of higher performance.
At present, corresponding SDK SDK has been issued in main GPU hardware supply commercial city so that developer can
Application and development is carried out using high-level programming language.Wherein, CUDA (the Compute Unified Device of NVIDIA companies issue
Architecture, unified calculation equipment framework) most popular parallel framework can be turned into as the extension of C/C++ language
One of.
Combat Command System under the real-time processing requirement of intensive calculating task and GPU general-purpose computations fast developments the back of the body
, it is necessary to design a kind of new enquiring and optimizing method under scape.
The content of the invention
(1) technical problems to be solved
The technical problem to be solved in the present invention is:A kind of spacing query method how is designed, tree-shaped is based on to solve tradition
Index progress space querying execution step is more, filtering screening is not careful, search efficiency caused by accurate matching primitives complexity is low
Problem.
(2) technical scheme
In order to solve the above-mentioned technical problem, it is described the invention provides a kind of parallel spatial querying method based on CUDA
Method comprises the following steps:
Step 100. reads vector element, according to element type and index ID, establishes the parallel grid suitable for GPU environment
Lattice index;
Step 200. carries out tile decomposition to the grid index according to video memory space, and is accustomed to establishing according to space access
Index replacement policy and carry out tile index replacement;
Step 300. realizes that parallel spatial is inquired about using the multi-thread mechanism in GPU and the tile of generation index.
Preferably, the step 100, comprises the following steps:
Step 110. input vector element information, read the point coordinates and corresponding vector element index letter of vector element
Breath;
Step 120. carries out grid to vector element and rendered according to vector element information;Grid render process is divided into three steps,
The first step, extract the profile information of vector element;Second step, scanning profile simultaneously calculate filling mesh region;3rd step, according to water
The mode of simple scan carries out span or across unit filling to filling region;
Finally perform index parsing, index information obtained by the vector element of input, by index translation engine by its
In grid cell corresponding to being mapped to, finally the result with vector element rasterizing generates grid index by rendering cache.
Preferably, the step 200, comprises the following steps:
Step 210. carries out tile segmentation, and filled according to video memory space size and GPU number of threads to grid index
It is fitted in video memory;
Step 220. carries out tile index when the grid index in video memory can not meet inquiry needs, according to LFU strategies
Replace, otherwise according to the next tile to be replaced of index accesses frequency selection purposes.
Preferably, the step 300, comprises the following steps:
The tile of generation is indexed from host side and is transferred to equipment end by step 310.;
Firstly, for given video memory size, the scope of index to be stored is determined, if assuming to include m row n row tiles,
The indexed set that should be stored is combined intoI represent index is expert at, j represent index column, then to each tile it
Between, stored by the way of Hilbert curves, finally for each tile inside, take row scan mode store;
Inquiry request condition Bbox is loaded into equipment end by step 320.;
Step 330. given thread block and number of threads, the parallel query kernel function at starting device end;
Wherein number of threads is specified in two dimensions, and carrys out computational threads number of blocks according to the MBR scopes of inquiry request;
Partial index in video memory is moved back to host side, host side by step 340. when the index of inquiry is not in video memory
Corresponding index is transferred to equipment end;
Step 350. starts strategy according to thread, performs parallel query kernel function, extracts spatial index collection;
Index after screening is transmitted back to host side by step 360., will according to vector corresponding to the index output inquired
Element.
Preferably, the step 350, comprises the following steps:
Step 351. merges access principles, parallel query indexed results collection using thread;
Strategy is started according to thread first, the locus of thread process is calculated by three layers of offset relationship, by this
Locus calculates the index tile belonging to the locus, is entered according to H-code corresponding to the sequence of tile Hilbert curves
One step determines skew of the index in video memory, and H-code represents Hilbert codings;
Step 352. is compressed using thread parallel and forerunner's summation, indexed results collection is carried out parallelly compressed and collected;
Effective index ID of extraction is ranked up using CUDPP parallel merging algorithm, to the index set after sequence
Each thread is handled an element respectively, calculate the difference of the element and its forerunner and save as mark amount flag, according to
Flag mark property, it is parallelly compressed using CUDPP compact progress to index set, before mark to be measured to non-zero key element ID
Put, finally try to achieve final ID result sets using parallel Prefix sum methods.
(3) beneficial effect
The beneficial effects of the present invention are:
1. the present invention parallel grid querying method, as long as according to querying condition by grid index correspondence position and
Row extraction, just can determine query resultses.This method incorporates two steps of filtering and refining of luv space querying method, both
Candidate Set is avoided in video memory and the frequent transmission in main memory, substantial amounts of calculating process when it also avoid accurately matching;
2. the spacing query method proposed by the present invention based on grid index and GPGPU parallel architectures is applied to processing magnanimity
Intensive line noodles spatial data, compared with the tree index method in CPU, can reach the improved efficiency of magnitude.For tradition
Tree index querying method in CPU, data volume more big data is more intensive, and search efficiency is lower, and method proposed by the present invention
Search efficiency do not influenceed by data complexity, only influenceed by query context and the ID set sizes inquired, its retrieve
Time complexity is one-dimensional linear;
3. it is network due to the advantage of parallel spatial querying method in recall precision proposed by the present invention based on CUDA
The performance boost of spatial retrieval under Map Services pattern provides good probing direction.
Brief description of the drawings
Fig. 1 is the parallel spatial querying method flow chart of steps based on CUDA of the embodiment of the present invention;
Fig. 2 is the step flow chart that grid index is established in the embodiment of the present invention;
Fig. 3 is the parallel grid index generating process of the embodiment of the present invention;
Fig. 4 is that the step flow chart that index transmits in equipment end and host side is realized in the embodiment of the present invention;
Fig. 5 is neighborhood tile scope schematic diagram in the embodiment of the present invention;
Fig. 6 is the step flow chart for realizing parallel spatial inquiry in the embodiment of the present invention in GPU based on grid index;
Fig. 7 is that storage mode schematic diagram of the tile in video memory is indexed in the embodiment of the present invention;
Fig. 8 is three layers of calculations of offset relation schematic diagram in the embodiment of the present invention;
Fig. 9 is irregular range query disposition schematic diagram in the embodiment of the present invention;Wherein a is two-value rasterizing mistake
Journey, b are extraction process;
Figure 10 is to be extracted parallel in the present invention in the embodiment of the present invention and screen the step flow chart of indexed results;
Figure 11 is to merge access mode schematic diagram in the embodiment of the present invention;
Figure 12 is that the parallel redundancy that excludes indexes schematic diagram in the embodiment of the present invention;
Figure 13 is hollow inquiry mode comparison diagram of the embodiment of the present invention;Wherein a is Traditional Space query steps;B is this
The space querying step of inventive method;
Figure 14 is query context ratio chart in the embodiment of the present invention;Wherein a is that 500 groups of scopes are looking into for D3 figure layers 1/100
Request is ask, b is the inquiry request that 500 groups of scopes are D3 figure layers 1/50;
Figure 15 is that the embodiment of the present invention inquires about comparing result figure with R* methods;When wherein a~f is respectively D1~D6 inquiries
Between;
Figure 16 is the CUDA profiler performance analysis charts of the embodiment of the present invention;Wherein a analyzes for GPU holding times, b
For GPU time percentage analysis.
Embodiment
To make the purpose of the present invention, content and advantage clearer, with reference to the accompanying drawings and examples, to the present invention's
Embodiment is described in further detail.
The embodiment of the present invention proposes a kind of parallel spatial querying method based on CUDA, using a kind of grid index skill
Art, effectively vector element and GPU (Graphics Processsing Unit, graphics processor) computation schema are combined
Get up, by indexing the efficient storage Model Design exchanged in the tactic of host side and equipment end and in video memory, fully profit
With GPU " many-core " mechanism, ensure that the task amount of each thread process is tried one's best uniformly and carries out grid computing intensive enough, most
The methods of being summed afterwards using grid extraction, two value ratio to, parallelly compressed, parallel forerunner, quickly calculates accurate Query Result.
Compared to the huge amount of calculation of " first outer packet filtering accurately matches again " of conventional method, the present invention has filtered non-phase to greatest extent
Space Elements are closed, simplify calculation procedure, meanwhile, when Space Elements degree of overlapping is higher, the present invention can avoid conventional tree-type rope
Redundant path problem caused by drawing inquiry, greatly improves computational efficiency, and the present invention proposes a kind of novelty for space querying
Resolving ideas.
The technical problem that the present invention can solve the problem that includes:1. suitable for the space index structure of GPU computing environment;2. index
The efficiency transmitted between CPU and GPU;3. improve thread utilization rate and simultaneously using the merging access mechanism of GPU multithreadings
Line efficiency.
A kind of parallel spatial querying method based on CUDA of the present invention is discussed in detail with reference to above-mentioned target, Fig. 1 is this
The step flow chart of a kind of parallel spatial querying method based on CUDA of inventive embodiments, as shown in figure 1, under this method includes
Row step:
Step 100. reads vector element, according to element type and index ID, establishes the parallel grid suitable for GPU environment
Index, efficient Indexing Mechanism is provided for the inquiry of follow-up parallel spatial.
In order to maximumlly utilize GPU multi-thread mechanism, the present invention is considered as lattice structure and establishes spatial index, and one
Aspect is evenly dividing beneficial to calculating task, on the other hand can greatly be guaranteed at the thread number of blocks of state of activation.
Fig. 2 is the step flow chart that grid index method is established in the present invention, as shown in Fig. 2 the step 100, including
The following steps:
Step 110. input vector element information, read the point coordinates and corresponding vector element index letter of vector element
Breath.
First, input vector element information, the information include the point coordinates of composition vector element, and are led to by Coordinate Conversion
Road carries out Coordinate Conversion, and vector element coordinate is accurate into sub-pixel precision.Subsequently into algorithm core.
Step 120. carries out grid to vector element and rendered according to vector element information.
The step is whole algorithm core, including two parts, and Part I (left part in Fig. 3) is vector element grid
Change process, Part II (right part in Fig. 3) are resolving of the key element index with corresponding grid.
Grid render process is divided into three steps, the first step, extracts the profile information of vector element;Second step, utilize grid control
Device processed, scanning profile simultaneously calculate filling mesh region;3rd step, in the way of horizontal sweep to filling region carry out span or
Across unit filling.
Index resolving is finally performed, index information is obtained by the vector element of input, passes through index translation engine
In grid cell corresponding to mapping that to, finally the result with vector element rasterizing generates grid rope by rendering cache
Draw.
Step 200. carries out tile decomposition to grid index according to video memory space, and is accustomed to establishing according to space access and indexes
Replacement policy carries out tile index and replaced.
Fig. 4 is that the step flow chart that index transmits in equipment end and host side is realized in the present invention, as shown in figure 4, described
Step 200, comprise the following steps:
Step 210. carries out tile segmentation, and filled according to video memory space size and GPU number of threads to grid index
It is fitted in video memory.
Due to the limited size in video memory space, therefore for the space figure layer key element under large scale, its grid index without
Method is disposably fully loaded in video memory space.In order to solve this problem, it is necessary to which grid index is carried out into tile segmentation first.
Using pre-generatmg and pre-assembled technology, in initialization, part tile is indexed according to the index space size distributed in video memory
Pre-transmission is to equipment end.If effective video memory grid index can not be obtained in the range of inquiry request, tile is taken to replace
Strategy is updated to index.
Step 220. carries out tile index when the grid index in video memory can not meet inquiry needs, according to LFU strategies
Replace, otherwise according to the next tile to be replaced of index accesses frequency selection purposes.
Due to the continuity of geographical space, user is likely to visit again its adjacent area when some region is inquired about
Domain, therefore need to account for adjacent area when replacing and updating.If order index tile is TileIndex (i, j), wherein i
Represent index to be expert at, j represents index column.Then the adjacent area of the tile uses Fig. 5 such as to represent, formal definitions are:
The index tile in video memory is accessed every time, and its correspondence position will be added up.If the index of current accessed
TileIndex (i, j) then replaces plan not in video memory according to LFU (Least Frequently Used, be least commonly used)
Slightly, access frequency is counted using counter Cuda PARALLEL MERGING/SORTING ALGORITHMs, removes the index tile at least used at first
TileIndex(s,t).A step for wherein needing to consider more is, if TileIndex (s, t) ∈ Ωi,j, then without for
Change, according to the next tile to be replaced of index accesses frequency selection purposes.False code is as shown in table 1.
Table 1 indexes tile renewal process
Step 300. realizes that parallel spatial is inquired about using the multi-thread mechanism in GPU and the tile of generation index.
Using the above-mentioned this grid index mechanism for being adapted to realize parallel spatial inquiry under GPU environment, realize parallel empty
Between inquire about.This part mainly discusses the most frequently used Spatial Select Query of two classes:Bbox scopes selection inquiry and polygon selection are looked into
Ask.Both inquiries are that polygon selection inquiry needs additionally again more calculates one in the difference of the parallel query of GPU equipment ends
Step, i.e., by irregular inquiry polygonal gird, and in multithreads computing, each thread needs carry out once more
Grid judges.Remaining total algorithm step can select inquiry to handle according to Bbox scopes.Therefore this part is first main introduces
The specific algorithm of Bbox scopes selection inquiry, then the algorithm is extended, please so as to handle arbitrary polygon selection inquiry
Ask.
Fig. 6 is the step flow chart for realizing parallel spatial inquiry in the present invention in GPU based on grid index, such as Fig. 6 institutes
Show, the step 300, comprise the following steps:
The tile of generation is indexed from host side and is transferred to equipment end by step 310..
Characteristic in view of GPU multithreadings to video memory space access, the storage indexed in equipment end need to meet to merge to visit
The principle asked, the memory module of the tile index in video memory use the method such as Fig. 7.
Firstly, for given video memory size, the scope of index to be stored is determined, if assuming to include m row n row tiles,
The indexed set that should be stored is combined intoThen between each tile, being deposited by the way of Hilbert curves
Storage, wherein original position of the tile specified in video memory space can pass through (H-code-1) * size fast positionings, H-code tables
Show that Hilbert is encoded, size represents the size of a block.Finally for inside each tile, the mode for taking row to scan stores,
And ensure that (W1, W2, W3, W4 are respectively a byte to line width, therefore line width is 64 words for (W1, W2, W3, W4) word length
Section), the multiple for being 16.
2 points of considerations are based primarily upon using such method:Because adjacent tiles index has on the geographical space of reality
Proximity, and Hilbert curves can be assembled adjacent index, thus greatly ensure that space between each tile
The good locality of data so that continuous one section of space in video memory is able to access that in space querying, is laid to merge access
Basis.In addition for the inside of each tile, the above method can cause 16 threads in Half-warp only by once transmitting
Can be to complete access requests of the whole Half-warp to video memory.After above-mentioned steps are handled, the tile index in video memory space
Both linear memory is realized, while also meets the principle for merging and accessing, is read so as to realize to merge, improves thread to video memory
Access efficiency.
Inquiry request condition Bbox is loaded into equipment end by step 320..
Real space querying scope is existing in the form of geographical coordinate, it is therefore desirable to realizes inquiry request to grid rope
The mapping process drawn.
Assuming that engineer's scale is Scale=1000, i.e., represented with 1 meter on map of distance 10000 in actual geographic space
The distance of rice.Grid resolution ratio can regard the grid unit quantity of covering in inches as, be represented with Dots_per_inch,
Relations I nches_per_meter=39.3701 be present additionally, due to rice and inch, therefore every meter of grid included can be calculated
Element number:
Dots_per_meter=Dots_per_inch × Inches_per_meter (2)
And then there is following relation in grid unit with actual geographic distance:
Resolution=Scale ÷ Dots_per_meter
=Scale ÷ Dots_per_inch ÷ Inches_per_meter (3)
Wherein, Resolution represents resolution ratio.
For the true geographical coordinate of inquiry request, it is necessary first to calculate skew ground according to the geographical outsourcing of whole figure layer
Manage coordinate.When the spatially constant translation of object detail or scaling occur for the figure layer, it is necessary to be carried out according to translating and scaling formula
Reduction, and then its corresponding grid line row number on original grid index is tried to achieve according to Resolution.
Step 330. given thread block and number of threads, the parallel query kernel function at starting device end.
This process mainly calculates the number of threads in the thread number of blocks and each block that participate in parallel spatial inquiry.Wherein
Number of threads is specified in two dimensions, and ensures that total number of threads is no more than 512 in each block, is set to TN (32,16).In order to ensure
Each one grid cell of thread process according to the MBR scopes of inquiry request, it is necessary to carry out computational threads number of blocks.
Assuming that query context is Bbox (x1, y1, x2, y2), and a width of idwidth of each index tile, it is a height of
idheight.The tile indexed set of so this scope covering is combined into ∪ Idx (s, t), wherein,
In order that participating in quantity covering ∪ Idx (s, the t) set of the thread block of parallel computation, pass through CUDA two-dimentional thread
Built-in variable can further calculate the distribution BN (bnx, bny) of Grid thread blocks.Wherein,
Wherein, bnx represents the abscissa of thread block, and bny represents the ordinate of thread block;Tn.x represents thread Tread's
Abscissa, tn.y represent thread Tread ordinate.
Partial index in video memory is moved back to host side, host side by step 340. when the index of inquiry is not in video memory
Corresponding index is transferred to equipment end.Otherwise do not perform and the partial index in video memory is moved back to the corresponding rope of host side, host side
Draw the step of being transferred to equipment end, directly perform step 350;
Step 350. starts strategy according to thread, performs parallel query kernel function, extracts spatial index collection.
Strategy is started according to BN (bnx, bny) and TN (32,16) thread, outlet can be calculated by three layers of offset relationship
The locus of journey processing.First, using the built-in variables of threadIdx in the kernel function of equipment end parallel spatial inquiry,
Deviation post of the current thread in thread block is determined, then further according to the built-in variograph of BlockDim, BlockIdx in Grid
Calculate global position of the thread in whole Grid, finally according to Grid cover starting tile video memory index in skew come
Further determine that the locus for the unit that current thread should be handled.The determination of locus is as shown in Figure 8.
If the locus needs to extract grid in inquiry outsourcing.Otherwise, current thread is not involved in transporting
Calculate.Select to inquire about for Bbox scopes, the calculating of irrelevant thread can be filtered in a manner described;Select to inquire about for polygon,
Need to carry out two-value rasterizing according to the outsourcing of querying condition, i.e., grid turns to 1 in the range of polygon, in outsourcing and outside polygon
Portion's grid turns to 0.Thread need to carry out grid computing, such as Fig. 9 in extraction grid cell parallel with the querying condition of correspondence position
It is shown.
Figure 10 is to be extracted parallel in the present invention and screen the step flow chart of indexed results.As shown in Figure 10, the step
350 comprise the steps of:
Step 351. merges access principles, parallel query indexed results collection using thread.
The specific extraction process of correspondence position grid cell in video memory:First according to BN's (bnx, bny) and TN (32,16)
Thread starts strategy, and the locus of thread process can be calculated by three layers of offset relationship, is calculated by the locus
Go out the index tile belonging to the locus, the rope is further determined that according to H-code corresponding to the sequence of tile Hilbert curves
Draw the skew in video memory.Due to being stored inside tile using row scan sequence, therefore only need to be according to locus in tile
The skew in portion obtains specific grid cell according to ranks order.
A key factor for influenceing CUDA performances is global memory access efficiency, due to being extracted in multithreading
Need largely to read and write video memory during indexing units, thus introduce this method global memory are read into
Capable optimization, this is also the reason for generation grid index uses nybble.When constructing grid index structure, using three bytes this
The index data that kind storage mode can preserve for the spatial information of intermediate data amount is enough, but if being so
If can not ensure the address align of thread accesses video memory in same half warp in the section that can merge access, therefore will
Data are caused to be divided into 16 serial transmissions.As shown in figure 11, because CUDA supports the merging of 32bit word lengths to access, the first feelings
Condition, (W1, W2, W3) word length of index can not be snapped in annexable section, cause the less efficient of thread accesses video memory.And the
Two kinds of situations, although a byte more than in storage, just meet that global memory internal memories merge access module,
Index aligns one by one with merging section, therefore the data of 16 thread accesses only need to be transmitted once in half warp, optimize
Access efficiency.
Step 352. is compressed using thread parallel and forerunner's summation, and indexed results collection is simplified and collected parallel.
The grid index unit of multiple thread parallel extractions has just obtained the index ID of correspondence position by de-parsing.For
Vector face key element, because it has area of space, therefore the adjacent several or several grid cells of its grid index have seemingly
Right property.The multi-threaded parallel inquiry degree of coupling is small, but can not ensure to index the uniqueness of ID set, if the whole inquired collected
Close slave unit end and be transmitted back to host side, its efficiency of transmission will be influenceed.For this reason, it may be necessary to Query Result is carried out using parallel algorithm
Aggregation process, i.e., only the set that key element index differs is transmitted.
The specific practice taken is:Effective index ID of extraction is ranked up using CUDPP parallel merging algorithm, it is right
Index set after sequence makes each thread handle an element respectively, calculates the difference of the element and its forerunner and saves as mark
Knowledge amount flag.So, in flag be 0 element position, it is corresponding index set in vector element ID must not repeat
's.On the contrary, illustrating to be redundancy value in flag for index ID corresponding to one of 0 or continuous one section of element, can be neglected.Cause
This, it is parallelly compressed using CUDPP compact progress to index set according to flag mark property, mark is measured into non-zero want
Plain ID is preposition, finally tries to achieve final ID result sets using parallel Prefix sum methods, and the partial set only is passed back into main frame
In the internal memory at end, so parallel query efficiency can be improved to greatly reduce transmission quantity.As shown in figure 12.
Index after screening is transmitted back to host side by step 360., will according to vector corresponding to the index output inquired
Element.
Parallel spatial querying method designed by the present invention is as shown in figure 13 compared with the difference that General Spatial is inquired about.The first feelings
Condition is the step space querying of tradition two using tree index, i.e., according to querying condition W MBR, first carries out spatial filtering, filter out
Candidate collection R1 and R2.Again by refinement step, Space Elements are accurately detected, it is R1 to find Query Result.Second
Situation is parallel grid querying method, as long as just can according to querying condition W by the parallel extraction to grid index correspondence position
Enough determine query resultses.This method incorporates two steps of filtering and refining of original method, has both avoided Candidate Set in video memory
With the frequent transmission in main memory, substantial amounts of calculating process when it also avoid accurately matching.
Beneficial effect to illustrate the invention, it is necessary to by fast parallel spacing query method proposed by the present invention and tradition
Spacing query method carries out Experimental comparison.Experimental section is broadly divided into two parts, and Part I mainly will be of the present invention
Parallel spatial querying method based on GPGPU contrasts with the space querying in traditional CPU, including query time and spatial filtering
Degree.The kernel performances of Part II Main Analysis this method, so as to verify the correctness and high efficiency of the present invention.
(1) contrasted with the method in CPU
The target of this part Experiment is the spacing query method in CPU of the test present invention with using R* spatial indexs,
Spatial retrieval time and spatial filtering degree in the case of identical benchmark test sets and same queries request.
The test data details of table 2
Data set | Feature | Type | Key element number | Size |
D1 | It is intensive | Point figure layer | 1466808 | 50.3MB |
D2 | It is sparse | Point figure layer | 405 | 11.1KB |
D3 | It is intensive | Line chart layer | 1631927 | 365.MB |
D4 | It is sparse | Line chart layer | 979 | 142KB |
D5 | It is intensive | Face figure layer | 3233 | 62.2MB |
D6 | It is sparse | Face figure layer | 123 | 1.48MB |
Test set is primarily directed to the different types of geographic element of 3 classes, and per class two groups of different characteristics of select factors
Data, one group is the sparse situation of element distribution, i.e. the minimum outsourcing of key element overlaps less;Another group is the intensive feelings of element distribution
Condition, the details of the benchmark test sets of selection are as shown in table 2.
6 groups of data for more than, are respectively adopted different space querying conditions and carry out spatial filtering to it.To ensure to inquire about
Fairness and uniformity, space querying request each tests figure layer according to different query contexts by generating at random
8 groups of experiments (every group includes 500 inquiry requests) are carried out, query context is then true by being carried out with the proportionate relationship of whole figure layer
It is fixed, such as original figure layer scope 1/120 is used as querying condition.Figure 14 is two group polling range scale figures.Wherein (a) is 500
Group scope is the inquiry request of D3 figure layers 1/100;(b) for 500 groups of scopes be D3 figure layers 1/50 inquiry request.
It can be found from Figure 15 test result:For point diagram sheaf space data, the method that the present invention uses can not show
Go out good advantage, main cause have it is following some:First, the dense degree regardless of space point data, its key element outsourcing
Gland degree and little, thus caused overlay path is less, therefore the space querying efficiency based on R* tree index is higher.
Secondly, parallel spatial inquiry of the present invention based on GPGPU, due to being related to the calculating of equipment end, it is therefore necessary to consider that data pass
The defeated performance brought restricts, for dense D1 data, although the calculating time of the inventive method is few compared with R* methods, if from whole
The body time considers, but suitable with using the space querying time of R* indexes in CPU.
For sparse data as line chart layer D4 and face figure layer D6, the advantage of this method is also difficult to bring into play.
Reason is that Sparse causes MBR overlapping less, therefore R* method efficiency has been lifted, and because data volume is smaller, GPU's
Calculating advantage does not find full expression.
But for the intensive line chart layer of magnanimity and face figure layer, such as D3 and D5.This method is in space querying time side
Face has absolute predominance compared with the space querying based on R* trees index in CPU.Because data volume is larger, the height of tree index
Degree will sharply increase, and due to data-intensive, cause the MBR degrees of overlapping for indexing intermediate node higher, therefore tree index is being located
Manage less efficient on this kind of spatial data.As can be seen that being put down for every group of data using the query time of R* indexes from Figure 15 c
Exceed 1600 milliseconds, and method proposed by the present invention can also be controlled on 50 milliseconds of left sides even if the transmission time for calculating data
The right side, now data transmission period can be neglected compared with the calculating time in CPU, therefore locate in method proposed by the present invention
32 times of speed-up ratio can be obtained in reason this kind of data of D3.Although likewise, as can be seen that the number of D5 faces key element from Figure 15 e
Mesh does not have that D3 is more, but it also has higher MBR degrees of overlapping, averagely needs 500 milliseconds or so using R* indexes, and the present invention carries
The method gone out needs 50 milliseconds or so, and its speed-up ratio can also reach 10 times or so.
In search efficiency comparative analysis more than to different type and different characteristics spatial data, we can carry out as follows
Summarize:It is intensive that spacing query method proposed by the present invention based on grid index and GPGPU parallel architectures is applied to processing magnanimity
Line noodles spatial data, compared with the tree index method in CPU, the improved efficiency of magnitude can be reached.For in traditional CPU
Tree index querying method, data volume more big data is more intensive, and search efficiency is lower, and the inquiry of method proposed by the present invention
Efficiency is not influenceed by data complexity, is only influenceed by query context and the ID set sizes inquired.
(2) kernel performance evaluations
This part is mainly in the inside implementation procedure of the present invention, and the time ratio shared by various pieces is tested, meaning
Performance impact of the data transfer to whole algorithm between analysis host side and equipment end.The analysis tool that we use is CUDA
Profiler, it is a kind of visualization tool that NVIDIA companies provide in CUDA toolkit, wherein containing a series of
The timetable of CPU and GPU crawler behavior in application program can be reflected, allow user by it is a kind of more intuitively in a manner of pair
GPU program is analyzed, while CUDA profiler are additionally provided and automatically analyzed engine to help user to determine performance bottle rapidly
Neck position, so as to find optimizable strategy.
As shown in figure 16, by Summary Plot charts, we can intuitively see kernel functions and memory
The percentage that copy takes in total GPU time.It is seen that the calculating section of method window query proposed by the present invention accounts for
The 76.21% of whole kernel times, and data copy has only accounted for 20% or so, therefore it may be concluded that the present invention carries
TCP data segment does not cause performance bottleneck in the method gone out.
The beneficial effects of the present invention are:
1. the present invention parallel grid querying method, as long as according to querying condition by grid index correspondence position and
Row extraction, just can determine query resultses.This method incorporates two steps of filtering and refining of luv space querying method, both
Candidate Set is avoided in video memory and the frequent transmission in main memory, substantial amounts of calculating process when it also avoid accurately matching;
2. the spacing query method proposed by the present invention based on grid index and GPGPU parallel architectures is applied to processing magnanimity
Intensive line noodles spatial data, compared with the tree index method in CPU, can reach the improved efficiency of magnitude.For tradition
Tree index querying method in CPU, data volume more big data is more intensive, and search efficiency is lower, and method proposed by the present invention
Search efficiency do not influenceed by data complexity, only influenceed by query context and the ID set sizes inquired, its retrieve
Time complexity is one-dimensional linear;
3. it is network due to the advantage of parallel spatial querying method in recall precision proposed by the present invention based on CUDA
The performance boost of spatial retrieval under Map Services pattern provides good probing direction.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these are improved and deformation
Also it should be regarded as protection scope of the present invention.
Claims (5)
1. a kind of parallel spatial querying method based on CUDA, it is characterised in that methods described comprises the following steps:
Step 100. reads vector element, according to element type and index ID, establishes the parallel grid rope suitable for GPU environment
Draw;
Step 200. carries out tile decomposition to the grid index according to video memory space, and is accustomed to establishing according to space access and indexes
Replacement policy carries out tile index and replaced;
Step 300. realizes that parallel spatial is inquired about using the multi-thread mechanism in GPU and the tile of generation index.
2. the parallel spatial querying method according to claim 1 based on CUDA, it is characterised in that the step 100, bag
Include the following steps:
Step 110. input vector element information, read the point coordinates of vector element and corresponding vector element index information;
Step 120. carries out grid to vector element and rendered according to vector element information;Grid render process is divided into three steps, and first
Step, extract the profile information of vector element;Second step, scanning profile simultaneously calculate filling mesh region;3rd step, sweeps according to level
The mode retouched carries out span or across unit filling to filling region;
Index parsing is finally performed, index information is obtained by the vector element of input, mapped by index translation engine
Into corresponding grid cell, finally the result with vector element rasterizing generates grid index by rendering cache.
3. the parallel spatial querying method according to claim 2 based on CUDA, it is characterised in that the step 200, bag
Include the following steps:
Step 210. carries out tile segmentation, and be assembled into according to video memory space size and GPU number of threads to grid index
In video memory;
Step 220. carries out tile index when the grid index in video memory can not meet inquiry needs, according to LFU strategies and replaced,
Otherwise according to the next tile to be replaced of index accesses frequency selection purposes.
4. the parallel spatial querying method according to claim 3 based on CUDA, it is characterised in that the step 300, bag
Include the following steps:
The tile of generation is indexed from host side and is transferred to equipment end by step 310.;
Firstly, for given video memory size, the scope of index to be stored is determined, if assuming to include m row n row tiles, should be deposited
The indexed set of storage is combined intoI represents index and is expert at, and j represents index column, then between each tile, adopting
Stored with the mode of Hilbert curves, finally for each tile inside, take row scan mode store;
Inquiry request condition Bbox is loaded into equipment end by step 320.;
Step 330. given thread block and number of threads, the parallel query kernel function at starting device end;
Wherein number of threads is specified in two dimensions, and carrys out computational threads number of blocks according to the MBR scopes of inquiry request;
Partial index in video memory is moved back to host side, host side is corresponding by step 340. when the index of inquiry is not in video memory
Index be transferred to equipment end;
Step 350. starts strategy according to thread, performs parallel query kernel function, extracts spatial index collection;
Index after screening is transmitted back to host side by step 360., according to vector element corresponding to the index output inquired.
5. the method for parallel extraction spatial index set according to claim 4, it is characterised in that the step 350, bag
Include the following steps:
Step 351. merges access principles, parallel query indexed results collection using thread;
Strategy is started according to thread first, the locus of thread process is calculated by three layers of offset relationship, passes through the space
Position calculates the index tile belonging to the locus, further according to H-code corresponding to the sequence of tile Hilbert curves
Skew of the index in video memory is determined, H-code represents Hilbert codings;
Step 352. is compressed using thread parallel and forerunner's summation, indexed results collection is carried out parallelly compressed and collected;
Effective index ID of extraction is ranked up using CUDPP parallel merging algorithm, the index set after sequence is made often
Individual thread handles an element respectively, calculates the difference of the element and its forerunner and saves as mark amount flag, according to flag's
Property is identified, it is parallelly compressed using CUDPP compact progress to index set, it is preposition that mark is measured into non-zero key element ID, finally
Final ID result sets are tried to achieve using parallel Prefix sum methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610741535.3A CN107784001A (en) | 2016-08-26 | 2016-08-26 | Parallel spatial querying method based on CUDA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610741535.3A CN107784001A (en) | 2016-08-26 | 2016-08-26 | Parallel spatial querying method based on CUDA |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107784001A true CN107784001A (en) | 2018-03-09 |
Family
ID=61440705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610741535.3A Pending CN107784001A (en) | 2016-08-26 | 2016-08-26 | Parallel spatial querying method based on CUDA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107784001A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711323A (en) * | 2018-12-25 | 2019-05-03 | 武汉烽火众智数字技术有限责任公司 | A kind of live video stream analysis accelerated method, device and equipment |
CN110297952A (en) * | 2019-06-05 | 2019-10-01 | 西南交通大学 | A kind of parallelization high-speed railway survey data search method based on grid index |
CN112307035A (en) * | 2020-11-26 | 2021-02-02 | 深圳云天励飞技术股份有限公司 | Characteristic value ID management method and device, electronic equipment and storage medium |
CN113032427A (en) * | 2021-04-12 | 2021-06-25 | 中国人民大学 | Vectorization query processing method for CPU and GPU platform |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719154A (en) * | 2009-12-24 | 2010-06-02 | 中国科学院计算技术研究所 | Grid structure-based spatial index establishing method and grid structure-based spatial index establishing system |
CN103744999A (en) * | 2014-01-23 | 2014-04-23 | 中国人民解放军国防科学技术大学 | Spatial vector data online interactive mapping method based on hierarchical-divided storage structure |
CN105354291A (en) * | 2015-11-02 | 2016-02-24 | 武大吉奥信息技术有限公司 | Raster data index and query method |
-
2016
- 2016-08-26 CN CN201610741535.3A patent/CN107784001A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719154A (en) * | 2009-12-24 | 2010-06-02 | 中国科学院计算技术研究所 | Grid structure-based spatial index establishing method and grid structure-based spatial index establishing system |
CN103744999A (en) * | 2014-01-23 | 2014-04-23 | 中国人民解放军国防科学技术大学 | Spatial vector data online interactive mapping method based on hierarchical-divided storage structure |
CN105354291A (en) * | 2015-11-02 | 2016-02-24 | 武大吉奥信息技术有限公司 | Raster data index and query method |
Non-Patent Citations (1)
Title |
---|
赵艳伟等: ""一种基于GPGPU的指挥系统空间查询优化方法"", 《指挥与控制学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711323A (en) * | 2018-12-25 | 2019-05-03 | 武汉烽火众智数字技术有限责任公司 | A kind of live video stream analysis accelerated method, device and equipment |
CN109711323B (en) * | 2018-12-25 | 2021-06-15 | 武汉烽火众智数字技术有限责任公司 | Real-time video stream analysis acceleration method, device and equipment |
CN110297952A (en) * | 2019-06-05 | 2019-10-01 | 西南交通大学 | A kind of parallelization high-speed railway survey data search method based on grid index |
CN110297952B (en) * | 2019-06-05 | 2021-12-21 | 西南交通大学 | Grid index-based parallelization high-speed railway survey data retrieval method |
CN112307035A (en) * | 2020-11-26 | 2021-02-02 | 深圳云天励飞技术股份有限公司 | Characteristic value ID management method and device, electronic equipment and storage medium |
CN112307035B (en) * | 2020-11-26 | 2024-01-05 | 深圳云天励飞技术股份有限公司 | Method and device for managing characteristic value ID, electronic equipment and storage medium |
CN113032427A (en) * | 2021-04-12 | 2021-06-25 | 中国人民大学 | Vectorization query processing method for CPU and GPU platform |
CN113032427B (en) * | 2021-04-12 | 2023-12-08 | 中国人民大学 | Vectorization query processing method for CPU and GPU platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yoon et al. | Cache-oblivious mesh layouts | |
US11651194B2 (en) | Layout parasitics and device parameter prediction using graph neural networks | |
Sousbie et al. | ColDICE: A parallel Vlasov–Poisson solver using moving adaptive simplicial tessellation | |
CN107784001A (en) | Parallel spatial querying method based on CUDA | |
CN103761215B (en) | Matrix transpose optimization method based on graphic process unit | |
DE112022004435T5 (en) | Accelerating triangle visibility tests for real-time ray tracing | |
CN107766471A (en) | The organization and management method and device of a kind of multi-source data | |
CN104036537A (en) | Multiresolution Consistent Rasterization | |
Liao et al. | Cluster-based visual abstraction for multivariate scatterplots | |
CN110663064A (en) | Parallelized pipeline for vector graphics and image processing | |
Tao et al. | Kyrix-s: Authoring scalable scatterplot visualizations of big data | |
Wang et al. | Spatial query based virtual reality GIS analysis platform | |
Ma et al. | HiVision: Rapid visualization of large-scale spatial vector data | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
CN107257356B (en) | Social user data optimal placement method based on hypergraph segmentation | |
Doraiswamy et al. | Spade: Gpu-powered spatial database engine for commodity hardware | |
Dou et al. | An equal‐area triangulated partition method for parallel Xdraw viewshed analysis | |
Beilschmidt et al. | An efficient aggregation and overlap removal algorithm for circle maps | |
Zhou et al. | Data decomposition method for parallel polygon rasterization considering load balancing | |
CN111274335A (en) | Rapid implementation method for spatial superposition analysis | |
Zhang et al. | ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval | |
Kipf et al. | Adaptive geospatial joins for modern hardware | |
Zhou et al. | A Parallel Scheme for Large‐scale Polygon Rasterization on CUDA‐enabled GPUs | |
Zhu et al. | Parallel image texture feature extraction under hadoop cloud platform | |
Yenpure et al. | State‐of‐the‐Art Report on Optimizing Particle Advection Performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhao Yanwei Inventor after: Wang Xiaoguang Inventor after: Yang Fan Inventor before: Zhao Yanwei Inventor before: Yang Xiongjun Inventor before: Wang Xiaoguang Inventor before: Yang Fan |
|
CB03 | Change of inventor or designer information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180309 |
|
RJ01 | Rejection of invention patent application after publication |