CN105893605B - Distributed Computing Platform and querying method towards space-time data k NN Query - Google Patents
Distributed Computing Platform and querying method towards space-time data k NN Query Download PDFInfo
- Publication number
- CN105893605B CN105893605B CN201610259255.9A CN201610259255A CN105893605B CN 105893605 B CN105893605 B CN 105893605B CN 201610259255 A CN201610259255 A CN 201610259255A CN 105893605 B CN105893605 B CN 105893605B
- Authority
- CN
- China
- Prior art keywords
- data
- index
- space
- time data
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 26
- 238000013523 data management Methods 0.000 claims abstract description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 24
- 238000012423 maintenance Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 230000003993 interaction Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000010845 search algorithm Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000002045 lasting effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000003319 supportive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2264—Multidimensional index structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of Distributed Computing Platforms and querying method towards space-time data k NN Query, the platform includes global index's data management module, it interacts data with data access distribution module, space-time data index module and query parallel processing module, is used to support distribution formula dynamic two-stage index structure;Data access distribution module, is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, and space-time data and spatiotemporal data warehouse are distributed to space-time data index module and query parallel processing module respectively according to distributed dynamic two-stage index structure;Space-time data index module establishes index, the location information of real-time update space-time data to the space-time data in respective queries region, and the space-time data location information of update is sent to query parallel processing module in real time;Query parallel processing module, according to the space-time data location information of update, the received spatiotemporal data warehouse of parallel processing exports spatiotemporal data warehouse result.
Description
Technical field
The present invention relates to spatiotemporal data warehouse technologies, belong to computer application field, more particularly to one kind towards space-time number
According to the Distributed Computing Platform and querying method of k NN Query.
Background technique
Space-time data (Spatial-Temporal Data) refers to a kind of data with room and time dimension, it is logical
The spatial information for being commonly used to describe certain an object changes with time state.In recent years, with all kinds of mobile devices (such as mobile phone,
GPS device), the fast development of the large-scale application of wireless sensor and electronic monitoring equipment and mobile Internet, society is raw
Many rapidly generates a large amount of space-time data based on the application of location-based service in work, and spatiotemporal data warehouse is also in intelligence
The fields such as traffic, e-commerce, social networks generate more and more important influence.In recent years, spatiotemporal data warehouse is by the country
Massive spatio-temporal data inquiry under the extensive concern of outer scholar, especially big data background becomes an emerging research hotspot.
The space-time data scale in current many fields increases in " explosion " formula, and traditional single machine calculating mode is calculated and storage capacity
Limitation, be difficult to cope with the concurrently inquiry on extensive space-time data and space-time data.
Currently, domestic and foreign scholars have done many work in spatiotemporal data warehouse field, but for distributed environment
Under the research of massive spatio-temporal data k NN Query be still in infancy, which still suffers from huge challenge.Specific table
It is now as follows:
(1) lack the Distributed Computing Platform that can support massive spatio-temporal data large-scale concurrent k NN Query.
(2) lack the distributed index structure for supporting massive spatio-temporal data frequent updating and parallel k NN Query algorithm, lead
It causes to the supportive poor of massive spatio-temporal data distributed storage and maintenance.
(3) most of existing spatiotemporal data warehouse method is all based on the centralized processing method that single machine calculates environment,
It is difficult to directly be deployed on Distributed Computing Platform, lacks effective distribution k nearest Neighbor.
Summary of the invention
The purpose of the present invention is to solve the above-mentioned problems, provides a kind of distribution towards space-time data k NN Query
Formula computing platform and querying method.The present invention has and can either carry out real-time storage and dimension to the massive spatio-temporal data of lasting variation
The advantages of protecting, and real-time response capable of being carried out to large-scale concurrent k NN Query.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of Distributed Computing Platform towards space-time data k NN Query, comprising:
Data access distribution module is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, according to
Space-time data and spatiotemporal data warehouse are distributed to data cache module by distributed dynamic two-stage index structure DTLI respectively;
Distributed dynamic two-stage index structure DTLI includes first order strip index and the second level net based on strip index
Lattice index, the first order strip index are constituted by being divided along the x-axis direction to space-time data;The second level grid rope
Draw is to be divided and constituted along y-axis to the space-time data of each strip index;
Data cache module is used to space-time data and spatiotemporal data warehouse that data cached access distribution module is sent;
Space-time data index module establishes index to the space-time data in each strip index region respectively, and to index number
According to progress real-time update;The space-time data index module also monitor in real time the space-time data that the data cache module reaches and
Then spatiotemporal data warehouse obtains space-time data and inquiry that itself should be handled;
Global index's data management module safeguards the boundary letter of the strip index of a distributed dynamic two-stage index structure
Breath be used as global index's data, and with data access distribution module, space-time data index module and query parallel processing module into
Line index data interaction;
Query parallel processing module carries out distributed treatment to space-time data k NN Query.
The data access distribution module, the data access operator tuple being distributed in by several on different physical computing nodes
At each data access operator is a logic computing unit.
The space-time data index module, the data directory operator tuple being distributed in by several on different physical computing nodes
At each data access operator is a logic computing unit.
The query parallel processing module, the data query operator tuple being distributed in by several on different physical computing nodes
At each data query operator is a logic computing unit.
Data interaction is realized by way of sending and receiving event between each operator.
The event is one<key, value>data pair, and each operator will specify it according to the title of event and key value
The event received.
A kind of space-time data distribution k nearest Neighbor based on Distributed Computing Platform, comprising the following steps:
Step (1): the Distributed Computing Platform of above-mentioned massive spatio-temporal data k NN Query is constructed;
Step (2): distributed dynamic two-stage index structure DTLI is disposed on Distributed Computing Platform;
Step (3): it is based on distributed dynamic two-stage index structure DTLI, at the massive spatio-temporal data continuously reached
Reason;
Step (4): being based on DTLI index structure, and by space-time data distribution k NN Query algorithm, i.e. PSK algorithm is disposed
Onto Distributed Computing Platform, the parallelization of PSK algorithm is realized, and then realize to the parallel of massive spatio-temporal data k NN Query
Processing.
The detailed process of the step (2), comprising:
Step (2.1): the strip rope of each of data access distribution module data access operator maintenance portion DTLI
The boundary information drawn, global index's data management module also safeguard the boundary information of the strip index of portion DTLI, but do not remember
Record the location information of any space-time data;
Step (2.2): each data directory operator in space-time data index module is responsible for one strip index of maintenance, and
Space-time data in strip index is stored and updated;Each data directory operator indexes it in the strip of self maintained
Distributed dynamic two-stage index structure is deployed on Distributed Computing Platform by upper building second level grid index to realize.
In the step (2.2), if occurring the data directory operator in space-time data index module is responsible for maintenance one
A strip index occurs division or merges, the boundary information which in real time indexes changed strip
It is written to global index's data management module.
Data access operator in data access distribution module is by " monitoring " operation one monitor process of starting to the overall situation
The data of index data management module carry out continuing monitoring.When monitor process once finding the item of global index's data management module
Shape index is changed, and data access operator then obtains the strip index that global index's data management module updates in real time to be come
The local strip of covering indexes corresponding part.
The detailed process of the step (3), comprising:
Step (3.1): each of data access distribution module data access operator concurrently accesses the space-time of arrival
Data, and be that space-time data distributes corresponding data directory operator according to the strip of DTLI index.
Step (3.2): space-time data is sent to data buffer storage mould by the data access operator in data access distribution module
Block, each data directory operator in data directory module continue the space-time data reached in monitored data cache module, and from
The space-time data that itself should be handled is obtained in data cache module in real time.
In the step (4), using PSK algorithm parallel processing space-time data k NN Query.
Wherein, space-time data of the invention refers to the mobile object (such as people, vehicle) in two-dimensional surface space, these movements pair
The position consecutive variations of elephant, and continually report in coordinate to data center.
Beneficial effects of the present invention:
(1) Distributed Computing Platform towards massive spatio-temporal data k NN Query that the present invention uses has global index
Data management module and data cache module can support distributed dynamic two-stage index knot proposed by the invention well
Structure meets space-time data k NN Query for the distributed access demand of global index's data, avoids different operators in processing
There are data when empty data k NN Query and mistake problem, is provided for massive spatio-temporal data large-scale concurrent k NN Query general
Distributed Computing Platform;
(2) distributed dynamic two-stage index structure proposed by the invention can to the massive spatio-temporal data of lasting variation into
Row real-time storage and maintenance;In addition, the index structure has good scalability, under distributed environment, only pass through increase
Hardware resource can realize the linear increase of index structure time-space data analysis ability;Finally, the index structure can be fine
Search algorithm PSK is supported on ground, largely accelerates the convergence of search algorithm PSK;
(3) present invention realizes real-time processing to the k NN Query on space-time data using PSK algorithm, reduces point
The communication cost between physical computing nodes caused by space-time data k NN Query is handled under cloth environment, it can be to big rule
The concurrent k NN Query of mould carries out real-time response, and search efficiency significantly improves.
Detailed description of the invention
Fig. 1 is massive spatio-temporal data distributed dynamic two-stage index structure (DTLI) schematic diagram;
Fig. 2 is DTLI strip index division schematic diagram;
Fig. 3 is the Distributed Computing Platform architecture diagram of massive spatio-temporal data inquiry;
Fig. 4 is PSK algorithm demo system figure;
Fig. 5 is based on distributed dynamic two-stage index pattern handling space-time data OiFlow chart;
Fig. 6 is PSK Algorithm parallelization architecture diagram.
Specific embodiment
In the present invention, Distributed Computing Platform, distributed dynamic two-stage rope towards massive spatio-temporal data k NN Query
The PSK algorithm three of guiding structure and support massive spatio-temporal data distribution k NN Query is closely connected.With reference to the accompanying drawing, in detail
The thin Distributed Computing Platform illustrated towards massive spatio-temporal data k NN Query, distributed dynamic two-stage index structure and PSK are calculated
Relationship between method three.
As shown in figure 3, for the Distributed Computing Platform of the invention towards massive spatio-temporal data inquiry.Its specific structure group
At as follows:
Distributed Computing Platform towards space-time data k NN Query of the invention, comprising:
Data access distribution module is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, according to
Space-time data and spatiotemporal data warehouse are distributed to space-time data index module respectively and looked by distributed dynamic two-stage index structure
Ask parallel processing module;
Space-time data index module is the space-time data foundation index in the respective queries region being responsible for it, and real
When maintenance update space-time data index structure;
Data cache module, for the data interaction of data access distribution module and space-time data index module, to provide data slow
Area is deposited, mistakes for solving data caused by data asynchronous refresh between " data access operator " and " data directory operator " and asks
Topic;
Global index's data management module is responsible for real-time update and maintenance to global index's data, and it provides " reading
According to " and " writing data " two kinds of data manipulation methods, come respectively with space-time data index module, query parallel processing module and look into
It askes and is indexed data interaction between computing module;
The inquiry computing module handles the request of space-time data k NN Query according to search algorithm PSK;
Specifically, data access distribution module is by multiple data access operators being distributed on different physical computing nodes
Entrance Actor composition, each data access operator is a logic computing unit.The main task of the module is real
When the space-time data that continuously reaches of access and user query, and according to distributed dynamic two-stage index structure, be space-time data and
Inquiry distributes corresponding data directory operator Index Actor.
Specifically, space-time data index module is distributed in the data directory on different physical computing nodes by several and calculates
Sub- Index Actor composition, each data directory operator are a logic computing unit.Each Index Actor is responsible for entire
The sub-fraction region of query region.There are two the functions of each Index Actor: (1) to the space-time number in the partial region
It is indexed according to establishing, and real-time update is carried out to index.(2) according to space-time data k NN Query algorithm process by Entrance
The space-time data k NN Query of Actor distribution.Each user query are usually by multiple Index Actor parallel processings, each
Index Actor generates the part intermediate result of the inquiry according to the space-time data that itself is safeguarded.
Specifically, query parallel processing module is distributed in the data query on different physical computing nodes by several and calculates
Son composition, each data query operator are a logic computing unit.
Each space-time data k NN Query is responsible for by unique data query operator Search Actor, and one
Search Actor handles multiple space-time data k NN Queries according to own load simultaneously.Each Search Actor can be ordered
The inquiry for enabling corresponding Index Actor responsible to its is handled.Search Actor is by sending and receiving the shape of event
Formula realizes the data interaction between Index Actor.
Space-time data index structure under distributed environment is a kind of global index's data, and existing several computing units are therefrom
Data are read, also there are several computing units to be written to data.Therefore, global index's data management module has weight in the architecture
It acts on.The main task of the module is to safeguard a global index data (i.e. the strip of DTLI indexes), and provide two kinds of numbers
According to operating method: put and get.The computing unit of different physical nodes can be by get and put operation to the data of the module
It is written and read, to guarantee global index's data timely updating between different computing units.
Further, Distributed Computing Platform of the invention further include: data cache module is used to data cached access point
Hair module is sent to the data of the space-time data index module.Data cache module is introduced, is mainly used for solving Entrance
Data caused by index data asynchronous refresh mistake problem between Actor and Index Actor.In practical application, Index
What strip indexed in Actor updates the update always indexed earlier than strip in Entrance Actor, at this point, same strip indexes
Version in Index Actor and Entrance Actor may be inconsistent, in this case, Entrance Actor distribution
Mistake to occur to the data of Index Actor and inquiry.For this purpose, the invention introduces data cache module, enable each
Space-time data and spatiotemporal data warehouse are no longer sent directly to Index Actor by Entrance Actor, but are sent to number
According to cache module, each Index Actor in data cache module data and inquiry carry out continue monitor to obtained from
The data and inquiry that body should be handled.
The present invention uses global index's data management module, provides " reading data " and " writing data " two kinds of data manipulation sides
Method, for being indexed data interaction with data access distribution module, space-time data index module and query parallel processing module,
For support distribution formula dynamic two-stage index structure, the distributed meter towards massive spatio-temporal data k NN Query is finally constructed
Platform is calculated, which can support the extensive k NN Query of massive spatio-temporal data using parallel computation mode.
Space-time data index structure applied by the present invention is designed for massive spatio-temporal data distribution k NN Query
Distributed dynamic two-stage index structure (Distributed Two-Levels Index, DTLI), as shown in Figure 1.
Distributed dynamic two-stage index structure (DTLI) of the invention includes two parts: (1) strip index and grid index
Boundary information;(2) location information for the mobile object being indexed.The boundary information of index is relatively small, and mobile object
Location information is huge.With the frequent variation of mobile object location, two kinds of information are all by frequent updating.
The index structure first divides global space-time data along the x-axis direction, building first order strip index, strip
It indexes parallel with y-axis direction.S is indexed for any one stripi, itself responsible region is determined by the upper bound and lower bound.Item
Shape indexes siAccording to the mobile object foundation index that the position coordinates of mobile object are in self zone.
Each strip index has characteristics that
(1) draw the number of mobile object in each strip rope always within the scope of certain.Assuming that a strip index
The quantity of interior mobile object is m, then l < m < h, wherein l and h respectively indicates single strip and indexes interior mobile object quantity
Lower and upper limit.
(2) if the mobile object quantity of strip index is greater than h, strip index can divide along the y-axis direction
At two small strip indexes, the mobile object quantity of each strip index is made to be less than h, Fig. 2 is a strip index division behaviour
The schematic diagram of work;
(3) if the mobile object quantity of strip index is less than l, strip index will be with adjacent strip
Index merges, and the mobile object quantity of the strip index after merging is made to be greater than l.
The second level DTLI index is the grid index based on strip index.Grid index is the shifting to each strip index
Dynamic object is divided along y-axis, and strip rope each so is just divided into multiple grids, and each grid is to the shifting in self zone
Dynamic object establishes index.In Fig. 1, strip indexes siIt is divided into multiple grid index (g1, g2..., gn)。
Grid index and strip index have similar characteristic:
(1) quantity of the mobile object of each grid index is always within the scope of one;
(2) each grid can be divided with adjacent grid within strip index range or be merged to meet
One characteristic.
Distributed Computing Platform of the invention deploys distributed dynamic two-stage index structure DTLI, in conjunction with Fig. 1 and Fig. 3,
Detailed process is as follows for deployment:
Step (1): when DTLI is deployed on Distributed Computing Platform of the present invention, each Entrance
Actor saves the boundary information of portion DTLI first order strip index, but does not record the location information of any mobile object.
The task of Entrance Actor is the mobile object distribution Index Actor for arrival and sends it to data buffer storage mould
Block.Specifically, mobile object o newly arrived for onei(xi, yi), Entrance Actor calculates corresponding strip for it
Index sj{lbj, ubj, then with four-tuple < oi,(xi, yi),sj,{lbj, ubj> form be sent to data cache module.By
Data volume very little in strip index boundary, the data that can guarantee that each Entrance Actor is safeguarded in this way are light
Magnitude, and the mobile object continuously reached can be distributed in real time, so that Entrance Actor be avoided bottleneck occur.
Step (2): each Index Actor is responsible for the strip index of one DTLI of maintenance, the movement to strip index
The position of object is stored and is updated.In addition, each Index Actor establishes second level on the strip index of self maintained
Grid index.According to the characteristic of DTLI dynamic adjustment, the strip index of DTLI can occur to divide or merge, corresponding Index
Actor also can be divided or be merged.
Step (3): global index's data management module of the present invention, the side of the strip index of maintenance portion DTLI
Boundary's information.If some Index Actor occurs division or merges, the Index Actor is in real time by changed item
The boundary information of shape index is written to global index's data management module.Each Entrance Actor can pass through " monitoring "
Operation one monitor process of starting carries out the data of global index's data management module to continue monitoring.When monitor process once sending out
The strip index of existing global index's data management module is changed, and Entrance Actor then calls get method in real time
It obtains the strip index that global index's data management module updates and indexes corresponding part to cover local strip.In processing k
When NN Query, some Search Actor are also required to use global strip index information, will also access global index at that time
Data management module obtains strip index.The program realizes global index's number between Index Actor and other processing units
According to synchronization.
Step (4): data cache module of the present invention is used to safeguard that Entrance Actor is sent to Index
The data of Actor.Data cache module is that each Index Actor distributes a queue, and each Index Actor can continue to supervise
Listen the space-time data reached in data cache module queue.It enables and is responsible for maintenance strip index sjIndex Actor be IAj, institute it is right
The queue answered is Qj.For newly reaching queue QjSpace-time data oi, IAjListen to < oi,(xi, yi), sj,{lbj, ubj> after,
Judge oiWhether the strip index of self maintained is belonged to.If belonged to, IAjThe object will be read from data cache module simultaneously
The object is deleted from data cache module.If IAjIt was found that oiIt is not belonging to the strip index of self maintained, then by tuple < oi,
(xi, yi),sj,{lbj, ubj> in sjIt is set as Suspend, indicates that the mobile object is armed state.Data buffer storage mould
Block uses broadcast mode, and the mobile object of armed state is sent to the queue of other all Index Actor.If some
After Index Actor listens to the mobile object containing Suspend label, first determine whether the object belongs to self maintained
Strip index.If belonged to, the object is handled;Otherwise, then the mobile object is abandoned.
It is entire deployment scheme of the DTLI index structure on Distributed Computing Platform above, the program describes an energy
Enough support the expansible distributed dynamic index structure of massive spatio-temporal data real-time storage and maintenance.Fig. 6 describes the distribution
Formula indexes the flow chart of framework processing mobile object.
Specific step is as follows for massive spatio-temporal data distribution k nearest Neighbor (PSK algorithm) of the present invention:
Input: k NN Query qj(xj, yj);The location information of current all mobile objects;
Output: with query point qjThe nearest k mobile object of distance;
Step (1): for received k NN Query qj(xj, yj), being searched first according to the strip of DTLI index includes qj
And distance qjA strip index of nearest (k-1), generates candidate strip and indexes set V.S is indexed for any one stripi,
It and inquiry qjDistance definition be
Step (2): Distance query point q is selected from the candidate strip index of each of set VjH nearest mobile object,
Wherein h=min { k, θ }.
Step (3): Distance query q is selected from h*k mobile objectjK nearest object, and to k object according to
With qjDistance be ranked up, be calculated one then with qjFor the center of circle, with qjIt is the circle C of radius with k-th of object distancek。
Circle CkContain qjK neighbour.
Step (4): it is calculated and circle CkThe strip of intersection indexes set U, enables set W=U-V.The strip rope of set W
Draw refer to may containing inquiry qjK neighbour but be not belonging to set V strip index.
Step (5): each strip rope being responsible in set W, which is searched in self zone, belongs to round CkAnd with qjDistance is recently
K object.If the number of qualified mobile object is less than k, all qualified objects are selected.
Step (6): by the object obtained from the strip index in set W and determining circle C beforekK object compared
Compared with obtaining and qjK nearest mobile object, as qjK neighbour.
PSK algorithm example: in Fig. 4 (a), an inquiry q is givenj(3-NN), PSK algorithm determine candidate strip rope first
Draw set V={ si, si+1, si+2};In Fig. 4 (b), PSK algorithm indexes interior mobile object (o according to candidate strip1,o3,o6)
Position further determines that round Ck.According to PSK algorithm steps 4 it is found that strip index set W includes that strip indexes si+3.Then,
PSK algorithm is based on si+3Grid index, find in self zone by circle CkThe mobile object o of covering7.Finally, PSK algorithm is from shifting
Dynamic object set { o1,o3,o6,o7In find qj3 neighbours be { o1,o6,o7}。
After PSK algorithm to be deployed to Distributed Computing Platform of the invention, when entire Distributed Computing Platform parallel processing
The process of empty data k NN Query is as follows:
Step (1): " data access operator " is calculated candidate strip according to the step 1 of search algorithm PSK first and indexes
Gather { s1, s2..., sk, then by qjAnd its mark (< q of candidate strip indexj, se>, 1≤e≤k) it is sent to data buffer storage
Module.
Step (2): it is responsible for candidate's strip and indexes se" data directory operator " listen to inquiry qjAfterwards, it first determines whether current
S in local strip indexeBoundary and receive seBoundary it is whether consistent.It is carried out in next step if consistent;Otherwise, incite somebody to action <
qj, se> other all " data directory operators " are sent to, each " data directory operator " indexes s by comparing stripeBoundary
To judge oneself whether to handle inquiry qj。
Step (3): it is responsible for processing inquiry qj" data directory operator " according to the grid index of DTLI, quickly obtain distance
Inquire qjH nearest mobile object, h value definite opinion PSK algorithm steps 2 really.
Step (4): it is responsible for processing inquiry qjEach " data directory operator " will distance qjNearest h mobile object hair
It send to same " data query operator " SAj。SAjAccording to the step 3 of PSK algorithm, based on receiving by " data directory operator "
Round C is calculated in the mobile object of transmissionk。
Step (5): SAjThe boundary information that the strip for obtaining current DTLI from global index's data management module indexes, so
After be calculated it is all with circle CkThe strip of intersection indexes, to obtain the strip index set W in PSK algorithm steps 4.
Step (6): SAjQ will be inquiredjAnd circle CkIt is sent to corresponding " the data directory calculation of strip index in set W
Son ", each " data directory operator " are found in self zone by circle CkCovering and Distance query qjK nearest mobile object.And
These objects are sent to SAj.If qualified mobile object quantity is less than k, by all qualified movements pair
As being sent to SAj。
Step (7): SAjBy the mobile object newly received and determining circle C beforekK object be compared, to obtain
Inquire qjK neighbour.
Distributed-solution towards massive spatio-temporal data k NN Query proposed by the invention, reduces distribution
Communication cost between physical computing nodes needed for handling space-time data k NN Query under environment, can be to large-scale concurrent k
NN Query carries out real-time response, significantly improves search efficiency.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention
The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not
Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.
Claims (10)
1. a kind of Distributed Computing Platform towards space-time data k NN Query characterized by comprising
Data access distribution module is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, according to distribution
Space-time data and spatiotemporal data warehouse are distributed to data cache module by formula dynamic two-stage index structure respectively;
Distributed dynamic two-stage index structure includes first order strip index and the second level grid index based on strip index, institute
First order strip index is stated to be constituted by dividing space-time data along the x-axis direction;The second level grid index is to every
The space-time data of one strip index is divided along y-axis and is constituted;
Data cache module is used to space-time data and spatiotemporal data warehouse that data cached access distribution module is sent;
Space-time data index module establishes index, and real-time update space-time to the space-time data in each strip index region respectively
The location information of data;The space-time data index module also monitor in real time the space-time data that the data cache module reaches and
Then spatiotemporal data warehouse is obtained from the space-time data for being in reason and inquiry;
Global index's data management module safeguards that the boundary information of the strip index of a distributed dynamic two-stage index structure is made
For global index's data, and rope is carried out with data access distribution module, space-time data index module and query parallel processing module
Draw data interaction;
Query parallel processing module requests space-time data k NN Query to carry out distributed treatment.
2. a kind of Distributed Computing Platform towards space-time data k NN Query as described in claim 1, which is characterized in that
The data access distribution module, the data access operator being distributed on different physical computing nodes by several form, each
Data access operator is a logic computing unit.
3. a kind of Distributed Computing Platform towards space-time data k NN Query as described in claim 1, which is characterized in that
The space-time data index module, the data directory operator being distributed on different physical computing nodes by several form, each
Data access operator is a logic computing unit.
4. a kind of Distributed Computing Platform towards space-time data k NN Query as described in claim 1, which is characterized in that
The query parallel processing module, the data query operator being distributed on different physical computing nodes by several form, each
Data query operator is a logic computing unit.
5. a kind of Distributed Computing Platform towards space-time data k NN Query as claimed in claim 4, which is characterized in that
Data interaction is realized by way of sending and receiving event between the operator.
6. a kind of Distributed Computing Platform towards space-time data k NN Query as claimed in claim 5, which is characterized in that
The event is one<key, value>data pair, and each operator will specify it to be received according to the title of event and key value
Event.
7. a kind of space-time data distribution k NN Query based on the Distributed Computing Platform as described in claim 1-6 is any
Method, which comprises the following steps:
Step (1): Distributed Computing Platform of the building towards space-time data k NN Query;
Step (2): distributed dynamic two-stage index structure DTLI is disposed on Distributed Computing Platform;
Step (3): it is based on distributed dynamic two-stage index structure DTLI, the massive spatio-temporal data continuously reached is handled;
Step (4): it is based on distributed index structure, by space-time data distribution k NN Query algorithm, i.e. PSK algorithm is deployed to
On Distributed Computing Platform, the parallelization of PSK algorithm is realized, and then realize the parallel place to massive spatio-temporal data k NN Query
Reason.
8. querying method as claimed in claim 7, which is characterized in that the detailed process of the step (2), comprising:
Step (2.1): each of data access distribution module data access operator safeguards the strip index of portion DTLI
Boundary information, global index's data management module also safeguards the boundary information of the strip index of portion DTLI, but record is not appointed
The location information of what space-time data;
Step (2.2): each data directory operator in space-time data index module is responsible for one strip index of maintenance, and to this
The position of space-time data in strip index is stored and is updated;Each data directory operator is indexed in the strip of self maintained
On construct second level grid index, distributed dynamic two-stage index structure is deployed in Distributed Computing Platform to realize
On.
9. querying method as claimed in claim 8, which is characterized in that in the step (2.2), if there is space-time data rope
Draw the data directory operator in module to be responsible for the strip index generation division of maintenance or merge, the data directory operator is real
When the boundary information that changed strip indexes is written to global index's data management module.
10. querying method as claimed in claim 7, which is characterized in that the detailed process of the step (3), comprising:
Step (3.1): each of data access distribution module data access operator concurrently accesses the space-time data of arrival,
And corresponding data directory operator is distributed for space-time data according to the strip of DTLI index;
Step (3.2): space-time data is sent to data cache module by the data access operator in data access distribution module, number
Continue the space-time data reached in monitored data cache module according to each data directory operator in index module, from data buffer storage
The space-time data that itself should be handled is obtained in module in real time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610259255.9A CN105893605B (en) | 2016-04-25 | 2016-04-25 | Distributed Computing Platform and querying method towards space-time data k NN Query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610259255.9A CN105893605B (en) | 2016-04-25 | 2016-04-25 | Distributed Computing Platform and querying method towards space-time data k NN Query |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105893605A CN105893605A (en) | 2016-08-24 |
CN105893605B true CN105893605B (en) | 2019-02-22 |
Family
ID=56704558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610259255.9A Expired - Fee Related CN105893605B (en) | 2016-04-25 | 2016-04-25 | Distributed Computing Platform and querying method towards space-time data k NN Query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105893605B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108959352A (en) * | 2018-04-27 | 2018-12-07 | 北京天机数测数据科技有限公司 | Time-space data analysis platform and processing method based on time and Spatial Data Model |
US11068491B2 (en) | 2018-11-28 | 2021-07-20 | The Toronto-Dominion Bank | Data storage using a bi-temporal index |
CN112463814A (en) * | 2019-09-06 | 2021-03-09 | 阿里巴巴集团控股有限公司 | Data query method and device |
CN110990665B (en) * | 2019-12-11 | 2023-08-25 | 北京明略软件系统有限公司 | Data processing method, device, system, electronic equipment and storage medium |
CN111934958B (en) * | 2020-07-29 | 2022-03-29 | 深圳市高德信通信股份有限公司 | IDC resource scheduling service management platform |
CN112699173A (en) * | 2021-01-08 | 2021-04-23 | 哈尔滨航天恒星数据系统科技有限公司 | Spark-based distributed spatio-temporal object proximity query method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101827004A (en) * | 2010-05-12 | 2010-09-08 | 中国人民解放军国防科学技术大学 | Large-scale network environment-oriented distribution-type K neighbor node searching method |
CN102693293A (en) * | 2012-05-15 | 2012-09-26 | 清华大学 | Range query method and system for multivariable spatio-temporal data |
CN105138607A (en) * | 2015-08-03 | 2015-12-09 | 山东省科学院情报研究所 | Hybrid granularity distributional memory grid index-based KNN query method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324642B (en) * | 2012-03-23 | 2016-12-14 | 日电(中国)有限公司 | System and method and the data query method of index is set up for data |
-
2016
- 2016-04-25 CN CN201610259255.9A patent/CN105893605B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101827004A (en) * | 2010-05-12 | 2010-09-08 | 中国人民解放军国防科学技术大学 | Large-scale network environment-oriented distribution-type K neighbor node searching method |
CN102693293A (en) * | 2012-05-15 | 2012-09-26 | 清华大学 | Range query method and system for multivariable spatio-temporal data |
CN105138607A (en) * | 2015-08-03 | 2015-12-09 | 山东省科学院情报研究所 | Hybrid granularity distributional memory grid index-based KNN query method |
Also Published As
Publication number | Publication date |
---|---|
CN105893605A (en) | 2016-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105893605B (en) | Distributed Computing Platform and querying method towards space-time data k NN Query | |
CN106528773B (en) | Map computing system and method based on Spark platform supporting spatial data management | |
CN110147377A (en) | General polling algorithm based on secondary index under extensive spatial data environment | |
CN108920552A (en) | A kind of distributed index method towards multi-source high amount of traffic | |
CN110287391A (en) | Multi-level trajectory data storage method, storage medium and terminal based on Hadoop | |
CN106599190A (en) | Dynamic Skyline query method based on cloud computing | |
CN108460072A (en) | With electricity consumption data retrieval method and system | |
CN112163827A (en) | Satellite remote measurement intelligent service system | |
CN106599189A (en) | Dynamic Skyline inquiry device based on cloud computing | |
Jošilo et al. | Distributed algorithms for content placement in hierarchical cache networks | |
CN101477561B (en) | Large-scale space vector data management method based on content access network | |
CN105306547A (en) | Data placing and node scheduling method for increasing energy efficiency of cloud computing system | |
Deng et al. | Spatial-keyword skyline publish/subscribe query processing over distributed sliding window streaming data | |
CN115470510A (en) | Construction method and device of self-adaptive Cartesian grid data structure | |
CN105447132A (en) | Four-layer geographic data storage system oriented to Internet of Things application | |
CN108614889A (en) | Mobile object Continuous k-nearest Neighbor based on mixed Gauss model and system | |
CN109446294B (en) | Parallel mutual subspace Skyline query method | |
Rslan et al. | Spatial R-tree index based on grid division for query processing | |
CN115018322A (en) | Intelligent crowdsourcing task allocation method and system | |
Chen et al. | Efficient historical query in HBase for spatio-temporal decision support | |
Ding et al. | RDB-KV: A cloud database framework for managing massive heterogeneous sensor stream data | |
Aly et al. | A demonstration of aqwa: Adaptive query-workload-aware partitioning of big spatial data | |
CN114090275A (en) | Data processing method and device and electronic equipment | |
Chatzimilioudis et al. | Crowdsourcing emergency data in non-operational cellular networks | |
Dong et al. | IoT search method for entity based on advanced density clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190222 |
|
CF01 | Termination of patent right due to non-payment of annual fee |