CN105893605B - Distributed Computing Platform and querying method towards space-time data k NN Query - Google Patents

Distributed Computing Platform and querying method towards space-time data k NN Query Download PDF

Info

Publication number
CN105893605B
CN105893605B CN201610259255.9A CN201610259255A CN105893605B CN 105893605 B CN105893605 B CN 105893605B CN 201610259255 A CN201610259255 A CN 201610259255A CN 105893605 B CN105893605 B CN 105893605B
Authority
CN
China
Prior art keywords
data
index
space
time data
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610259255.9A
Other languages
Chinese (zh)
Other versions
CN105893605A (en
Inventor
于自强
王栋
韩士元
陈月辉
马坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN201610259255.9A priority Critical patent/CN105893605B/en
Publication of CN105893605A publication Critical patent/CN105893605A/en
Application granted granted Critical
Publication of CN105893605B publication Critical patent/CN105893605B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Distributed Computing Platforms and querying method towards space-time data k NN Query, the platform includes global index's data management module, it interacts data with data access distribution module, space-time data index module and query parallel processing module, is used to support distribution formula dynamic two-stage index structure;Data access distribution module, is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, and space-time data and spatiotemporal data warehouse are distributed to space-time data index module and query parallel processing module respectively according to distributed dynamic two-stage index structure;Space-time data index module establishes index, the location information of real-time update space-time data to the space-time data in respective queries region, and the space-time data location information of update is sent to query parallel processing module in real time;Query parallel processing module, according to the space-time data location information of update, the received spatiotemporal data warehouse of parallel processing exports spatiotemporal data warehouse result.

Description

Distributed Computing Platform and querying method towards space-time data k NN Query
Technical field
The present invention relates to spatiotemporal data warehouse technologies, belong to computer application field, more particularly to one kind towards space-time number According to the Distributed Computing Platform and querying method of k NN Query.
Background technique
Space-time data (Spatial-Temporal Data) refers to a kind of data with room and time dimension, it is logical The spatial information for being commonly used to describe certain an object changes with time state.In recent years, with all kinds of mobile devices (such as mobile phone, GPS device), the fast development of the large-scale application of wireless sensor and electronic monitoring equipment and mobile Internet, society is raw Many rapidly generates a large amount of space-time data based on the application of location-based service in work, and spatiotemporal data warehouse is also in intelligence The fields such as traffic, e-commerce, social networks generate more and more important influence.In recent years, spatiotemporal data warehouse is by the country Massive spatio-temporal data inquiry under the extensive concern of outer scholar, especially big data background becomes an emerging research hotspot. The space-time data scale in current many fields increases in " explosion " formula, and traditional single machine calculating mode is calculated and storage capacity Limitation, be difficult to cope with the concurrently inquiry on extensive space-time data and space-time data.
Currently, domestic and foreign scholars have done many work in spatiotemporal data warehouse field, but for distributed environment Under the research of massive spatio-temporal data k NN Query be still in infancy, which still suffers from huge challenge.Specific table It is now as follows:
(1) lack the Distributed Computing Platform that can support massive spatio-temporal data large-scale concurrent k NN Query.
(2) lack the distributed index structure for supporting massive spatio-temporal data frequent updating and parallel k NN Query algorithm, lead It causes to the supportive poor of massive spatio-temporal data distributed storage and maintenance.
(3) most of existing spatiotemporal data warehouse method is all based on the centralized processing method that single machine calculates environment, It is difficult to directly be deployed on Distributed Computing Platform, lacks effective distribution k nearest Neighbor.
Summary of the invention
The purpose of the present invention is to solve the above-mentioned problems, provides a kind of distribution towards space-time data k NN Query Formula computing platform and querying method.The present invention has and can either carry out real-time storage and dimension to the massive spatio-temporal data of lasting variation The advantages of protecting, and real-time response capable of being carried out to large-scale concurrent k NN Query.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of Distributed Computing Platform towards space-time data k NN Query, comprising:
Data access distribution module is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, according to Space-time data and spatiotemporal data warehouse are distributed to data cache module by distributed dynamic two-stage index structure DTLI respectively;
Distributed dynamic two-stage index structure DTLI includes first order strip index and the second level net based on strip index Lattice index, the first order strip index are constituted by being divided along the x-axis direction to space-time data;The second level grid rope Draw is to be divided and constituted along y-axis to the space-time data of each strip index;
Data cache module is used to space-time data and spatiotemporal data warehouse that data cached access distribution module is sent;
Space-time data index module establishes index to the space-time data in each strip index region respectively, and to index number According to progress real-time update;The space-time data index module also monitor in real time the space-time data that the data cache module reaches and Then spatiotemporal data warehouse obtains space-time data and inquiry that itself should be handled;
Global index's data management module safeguards the boundary letter of the strip index of a distributed dynamic two-stage index structure Breath be used as global index's data, and with data access distribution module, space-time data index module and query parallel processing module into Line index data interaction;
Query parallel processing module carries out distributed treatment to space-time data k NN Query.
The data access distribution module, the data access operator tuple being distributed in by several on different physical computing nodes At each data access operator is a logic computing unit.
The space-time data index module, the data directory operator tuple being distributed in by several on different physical computing nodes At each data access operator is a logic computing unit.
The query parallel processing module, the data query operator tuple being distributed in by several on different physical computing nodes At each data query operator is a logic computing unit.
Data interaction is realized by way of sending and receiving event between each operator.
The event is one<key, value>data pair, and each operator will specify it according to the title of event and key value The event received.
A kind of space-time data distribution k nearest Neighbor based on Distributed Computing Platform, comprising the following steps:
Step (1): the Distributed Computing Platform of above-mentioned massive spatio-temporal data k NN Query is constructed;
Step (2): distributed dynamic two-stage index structure DTLI is disposed on Distributed Computing Platform;
Step (3): it is based on distributed dynamic two-stage index structure DTLI, at the massive spatio-temporal data continuously reached Reason;
Step (4): being based on DTLI index structure, and by space-time data distribution k NN Query algorithm, i.e. PSK algorithm is disposed Onto Distributed Computing Platform, the parallelization of PSK algorithm is realized, and then realize to the parallel of massive spatio-temporal data k NN Query Processing.
The detailed process of the step (2), comprising:
Step (2.1): the strip rope of each of data access distribution module data access operator maintenance portion DTLI The boundary information drawn, global index's data management module also safeguard the boundary information of the strip index of portion DTLI, but do not remember Record the location information of any space-time data;
Step (2.2): each data directory operator in space-time data index module is responsible for one strip index of maintenance, and Space-time data in strip index is stored and updated;Each data directory operator indexes it in the strip of self maintained Distributed dynamic two-stage index structure is deployed on Distributed Computing Platform by upper building second level grid index to realize.
In the step (2.2), if occurring the data directory operator in space-time data index module is responsible for maintenance one A strip index occurs division or merges, the boundary information which in real time indexes changed strip It is written to global index's data management module.
Data access operator in data access distribution module is by " monitoring " operation one monitor process of starting to the overall situation The data of index data management module carry out continuing monitoring.When monitor process once finding the item of global index's data management module Shape index is changed, and data access operator then obtains the strip index that global index's data management module updates in real time to be come The local strip of covering indexes corresponding part.
The detailed process of the step (3), comprising:
Step (3.1): each of data access distribution module data access operator concurrently accesses the space-time of arrival Data, and be that space-time data distributes corresponding data directory operator according to the strip of DTLI index.
Step (3.2): space-time data is sent to data buffer storage mould by the data access operator in data access distribution module Block, each data directory operator in data directory module continue the space-time data reached in monitored data cache module, and from The space-time data that itself should be handled is obtained in data cache module in real time.
In the step (4), using PSK algorithm parallel processing space-time data k NN Query.
Wherein, space-time data of the invention refers to the mobile object (such as people, vehicle) in two-dimensional surface space, these movements pair The position consecutive variations of elephant, and continually report in coordinate to data center.
Beneficial effects of the present invention:
(1) Distributed Computing Platform towards massive spatio-temporal data k NN Query that the present invention uses has global index Data management module and data cache module can support distributed dynamic two-stage index knot proposed by the invention well Structure meets space-time data k NN Query for the distributed access demand of global index's data, avoids different operators in processing There are data when empty data k NN Query and mistake problem, is provided for massive spatio-temporal data large-scale concurrent k NN Query general Distributed Computing Platform;
(2) distributed dynamic two-stage index structure proposed by the invention can to the massive spatio-temporal data of lasting variation into Row real-time storage and maintenance;In addition, the index structure has good scalability, under distributed environment, only pass through increase Hardware resource can realize the linear increase of index structure time-space data analysis ability;Finally, the index structure can be fine Search algorithm PSK is supported on ground, largely accelerates the convergence of search algorithm PSK;
(3) present invention realizes real-time processing to the k NN Query on space-time data using PSK algorithm, reduces point The communication cost between physical computing nodes caused by space-time data k NN Query is handled under cloth environment, it can be to big rule The concurrent k NN Query of mould carries out real-time response, and search efficiency significantly improves.
Detailed description of the invention
Fig. 1 is massive spatio-temporal data distributed dynamic two-stage index structure (DTLI) schematic diagram;
Fig. 2 is DTLI strip index division schematic diagram;
Fig. 3 is the Distributed Computing Platform architecture diagram of massive spatio-temporal data inquiry;
Fig. 4 is PSK algorithm demo system figure;
Fig. 5 is based on distributed dynamic two-stage index pattern handling space-time data OiFlow chart;
Fig. 6 is PSK Algorithm parallelization architecture diagram.
Specific embodiment
In the present invention, Distributed Computing Platform, distributed dynamic two-stage rope towards massive spatio-temporal data k NN Query The PSK algorithm three of guiding structure and support massive spatio-temporal data distribution k NN Query is closely connected.With reference to the accompanying drawing, in detail The thin Distributed Computing Platform illustrated towards massive spatio-temporal data k NN Query, distributed dynamic two-stage index structure and PSK are calculated Relationship between method three.
As shown in figure 3, for the Distributed Computing Platform of the invention towards massive spatio-temporal data inquiry.Its specific structure group At as follows:
Distributed Computing Platform towards space-time data k NN Query of the invention, comprising:
Data access distribution module is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, according to Space-time data and spatiotemporal data warehouse are distributed to space-time data index module respectively and looked by distributed dynamic two-stage index structure Ask parallel processing module;
Space-time data index module is the space-time data foundation index in the respective queries region being responsible for it, and real When maintenance update space-time data index structure;
Data cache module, for the data interaction of data access distribution module and space-time data index module, to provide data slow Area is deposited, mistakes for solving data caused by data asynchronous refresh between " data access operator " and " data directory operator " and asks Topic;
Global index's data management module is responsible for real-time update and maintenance to global index's data, and it provides " reading According to " and " writing data " two kinds of data manipulation methods, come respectively with space-time data index module, query parallel processing module and look into It askes and is indexed data interaction between computing module;
The inquiry computing module handles the request of space-time data k NN Query according to search algorithm PSK;
Specifically, data access distribution module is by multiple data access operators being distributed on different physical computing nodes Entrance Actor composition, each data access operator is a logic computing unit.The main task of the module is real When the space-time data that continuously reaches of access and user query, and according to distributed dynamic two-stage index structure, be space-time data and Inquiry distributes corresponding data directory operator Index Actor.
Specifically, space-time data index module is distributed in the data directory on different physical computing nodes by several and calculates Sub- Index Actor composition, each data directory operator are a logic computing unit.Each Index Actor is responsible for entire The sub-fraction region of query region.There are two the functions of each Index Actor: (1) to the space-time number in the partial region It is indexed according to establishing, and real-time update is carried out to index.(2) according to space-time data k NN Query algorithm process by Entrance The space-time data k NN Query of Actor distribution.Each user query are usually by multiple Index Actor parallel processings, each Index Actor generates the part intermediate result of the inquiry according to the space-time data that itself is safeguarded.
Specifically, query parallel processing module is distributed in the data query on different physical computing nodes by several and calculates Son composition, each data query operator are a logic computing unit.
Each space-time data k NN Query is responsible for by unique data query operator Search Actor, and one Search Actor handles multiple space-time data k NN Queries according to own load simultaneously.Each Search Actor can be ordered The inquiry for enabling corresponding Index Actor responsible to its is handled.Search Actor is by sending and receiving the shape of event Formula realizes the data interaction between Index Actor.
Space-time data index structure under distributed environment is a kind of global index's data, and existing several computing units are therefrom Data are read, also there are several computing units to be written to data.Therefore, global index's data management module has weight in the architecture It acts on.The main task of the module is to safeguard a global index data (i.e. the strip of DTLI indexes), and provide two kinds of numbers According to operating method: put and get.The computing unit of different physical nodes can be by get and put operation to the data of the module It is written and read, to guarantee global index's data timely updating between different computing units.
Further, Distributed Computing Platform of the invention further include: data cache module is used to data cached access point Hair module is sent to the data of the space-time data index module.Data cache module is introduced, is mainly used for solving Entrance Data caused by index data asynchronous refresh mistake problem between Actor and Index Actor.In practical application, Index What strip indexed in Actor updates the update always indexed earlier than strip in Entrance Actor, at this point, same strip indexes Version in Index Actor and Entrance Actor may be inconsistent, in this case, Entrance Actor distribution Mistake to occur to the data of Index Actor and inquiry.For this purpose, the invention introduces data cache module, enable each Space-time data and spatiotemporal data warehouse are no longer sent directly to Index Actor by Entrance Actor, but are sent to number According to cache module, each Index Actor in data cache module data and inquiry carry out continue monitor to obtained from The data and inquiry that body should be handled.
The present invention uses global index's data management module, provides " reading data " and " writing data " two kinds of data manipulation sides Method, for being indexed data interaction with data access distribution module, space-time data index module and query parallel processing module, For support distribution formula dynamic two-stage index structure, the distributed meter towards massive spatio-temporal data k NN Query is finally constructed Platform is calculated, which can support the extensive k NN Query of massive spatio-temporal data using parallel computation mode.
Space-time data index structure applied by the present invention is designed for massive spatio-temporal data distribution k NN Query Distributed dynamic two-stage index structure (Distributed Two-Levels Index, DTLI), as shown in Figure 1.
Distributed dynamic two-stage index structure (DTLI) of the invention includes two parts: (1) strip index and grid index Boundary information;(2) location information for the mobile object being indexed.The boundary information of index is relatively small, and mobile object Location information is huge.With the frequent variation of mobile object location, two kinds of information are all by frequent updating.
The index structure first divides global space-time data along the x-axis direction, building first order strip index, strip It indexes parallel with y-axis direction.S is indexed for any one stripi, itself responsible region is determined by the upper bound and lower bound.Item Shape indexes siAccording to the mobile object foundation index that the position coordinates of mobile object are in self zone.
Each strip index has characteristics that
(1) draw the number of mobile object in each strip rope always within the scope of certain.Assuming that a strip index The quantity of interior mobile object is m, then l < m < h, wherein l and h respectively indicates single strip and indexes interior mobile object quantity Lower and upper limit.
(2) if the mobile object quantity of strip index is greater than h, strip index can divide along the y-axis direction At two small strip indexes, the mobile object quantity of each strip index is made to be less than h, Fig. 2 is a strip index division behaviour The schematic diagram of work;
(3) if the mobile object quantity of strip index is less than l, strip index will be with adjacent strip Index merges, and the mobile object quantity of the strip index after merging is made to be greater than l.
The second level DTLI index is the grid index based on strip index.Grid index is the shifting to each strip index Dynamic object is divided along y-axis, and strip rope each so is just divided into multiple grids, and each grid is to the shifting in self zone Dynamic object establishes index.In Fig. 1, strip indexes siIt is divided into multiple grid index (g1, g2..., gn)。
Grid index and strip index have similar characteristic:
(1) quantity of the mobile object of each grid index is always within the scope of one;
(2) each grid can be divided with adjacent grid within strip index range or be merged to meet One characteristic.
Distributed Computing Platform of the invention deploys distributed dynamic two-stage index structure DTLI, in conjunction with Fig. 1 and Fig. 3, Detailed process is as follows for deployment:
Step (1): when DTLI is deployed on Distributed Computing Platform of the present invention, each Entrance Actor saves the boundary information of portion DTLI first order strip index, but does not record the location information of any mobile object. The task of Entrance Actor is the mobile object distribution Index Actor for arrival and sends it to data buffer storage mould Block.Specifically, mobile object o newly arrived for onei(xi, yi), Entrance Actor calculates corresponding strip for it Index sj{lbj, ubj, then with four-tuple < oi,(xi, yi),sj,{lbj, ubj> form be sent to data cache module.By Data volume very little in strip index boundary, the data that can guarantee that each Entrance Actor is safeguarded in this way are light Magnitude, and the mobile object continuously reached can be distributed in real time, so that Entrance Actor be avoided bottleneck occur.
Step (2): each Index Actor is responsible for the strip index of one DTLI of maintenance, the movement to strip index The position of object is stored and is updated.In addition, each Index Actor establishes second level on the strip index of self maintained Grid index.According to the characteristic of DTLI dynamic adjustment, the strip index of DTLI can occur to divide or merge, corresponding Index Actor also can be divided or be merged.
Step (3): global index's data management module of the present invention, the side of the strip index of maintenance portion DTLI Boundary's information.If some Index Actor occurs division or merges, the Index Actor is in real time by changed item The boundary information of shape index is written to global index's data management module.Each Entrance Actor can pass through " monitoring " Operation one monitor process of starting carries out the data of global index's data management module to continue monitoring.When monitor process once sending out The strip index of existing global index's data management module is changed, and Entrance Actor then calls get method in real time It obtains the strip index that global index's data management module updates and indexes corresponding part to cover local strip.In processing k When NN Query, some Search Actor are also required to use global strip index information, will also access global index at that time Data management module obtains strip index.The program realizes global index's number between Index Actor and other processing units According to synchronization.
Step (4): data cache module of the present invention is used to safeguard that Entrance Actor is sent to Index The data of Actor.Data cache module is that each Index Actor distributes a queue, and each Index Actor can continue to supervise Listen the space-time data reached in data cache module queue.It enables and is responsible for maintenance strip index sjIndex Actor be IAj, institute it is right The queue answered is Qj.For newly reaching queue QjSpace-time data oi, IAjListen to < oi,(xi, yi), sj,{lbj, ubj> after, Judge oiWhether the strip index of self maintained is belonged to.If belonged to, IAjThe object will be read from data cache module simultaneously The object is deleted from data cache module.If IAjIt was found that oiIt is not belonging to the strip index of self maintained, then by tuple < oi, (xi, yi),sj,{lbj, ubj> in sjIt is set as Suspend, indicates that the mobile object is armed state.Data buffer storage mould Block uses broadcast mode, and the mobile object of armed state is sent to the queue of other all Index Actor.If some After Index Actor listens to the mobile object containing Suspend label, first determine whether the object belongs to self maintained Strip index.If belonged to, the object is handled;Otherwise, then the mobile object is abandoned.
It is entire deployment scheme of the DTLI index structure on Distributed Computing Platform above, the program describes an energy Enough support the expansible distributed dynamic index structure of massive spatio-temporal data real-time storage and maintenance.Fig. 6 describes the distribution Formula indexes the flow chart of framework processing mobile object.
Specific step is as follows for massive spatio-temporal data distribution k nearest Neighbor (PSK algorithm) of the present invention:
Input: k NN Query qj(xj, yj);The location information of current all mobile objects;
Output: with query point qjThe nearest k mobile object of distance;
Step (1): for received k NN Query qj(xj, yj), being searched first according to the strip of DTLI index includes qj And distance qjA strip index of nearest (k-1), generates candidate strip and indexes set V.S is indexed for any one stripi, It and inquiry qjDistance definition be
Step (2): Distance query point q is selected from the candidate strip index of each of set VjH nearest mobile object, Wherein h=min { k, θ }.
Step (3): Distance query q is selected from h*k mobile objectjK nearest object, and to k object according to With qjDistance be ranked up, be calculated one then with qjFor the center of circle, with qjIt is the circle C of radius with k-th of object distancek。 Circle CkContain qjK neighbour.
Step (4): it is calculated and circle CkThe strip of intersection indexes set U, enables set W=U-V.The strip rope of set W Draw refer to may containing inquiry qjK neighbour but be not belonging to set V strip index.
Step (5): each strip rope being responsible in set W, which is searched in self zone, belongs to round CkAnd with qjDistance is recently K object.If the number of qualified mobile object is less than k, all qualified objects are selected.
Step (6): by the object obtained from the strip index in set W and determining circle C beforekK object compared Compared with obtaining and qjK nearest mobile object, as qjK neighbour.
PSK algorithm example: in Fig. 4 (a), an inquiry q is givenj(3-NN), PSK algorithm determine candidate strip rope first Draw set V={ si, si+1, si+2};In Fig. 4 (b), PSK algorithm indexes interior mobile object (o according to candidate strip1,o3,o6) Position further determines that round Ck.According to PSK algorithm steps 4 it is found that strip index set W includes that strip indexes si+3.Then, PSK algorithm is based on si+3Grid index, find in self zone by circle CkThe mobile object o of covering7.Finally, PSK algorithm is from shifting Dynamic object set { o1,o3,o6,o7In find qj3 neighbours be { o1,o6,o7}。
After PSK algorithm to be deployed to Distributed Computing Platform of the invention, when entire Distributed Computing Platform parallel processing The process of empty data k NN Query is as follows:
Step (1): " data access operator " is calculated candidate strip according to the step 1 of search algorithm PSK first and indexes Gather { s1, s2..., sk, then by qjAnd its mark (< q of candidate strip indexj, se>, 1≤e≤k) it is sent to data buffer storage Module.
Step (2): it is responsible for candidate's strip and indexes se" data directory operator " listen to inquiry qjAfterwards, it first determines whether current S in local strip indexeBoundary and receive seBoundary it is whether consistent.It is carried out in next step if consistent;Otherwise, incite somebody to action < qj, se> other all " data directory operators " are sent to, each " data directory operator " indexes s by comparing stripeBoundary To judge oneself whether to handle inquiry qj
Step (3): it is responsible for processing inquiry qj" data directory operator " according to the grid index of DTLI, quickly obtain distance Inquire qjH nearest mobile object, h value definite opinion PSK algorithm steps 2 really.
Step (4): it is responsible for processing inquiry qjEach " data directory operator " will distance qjNearest h mobile object hair It send to same " data query operator " SAj。SAjAccording to the step 3 of PSK algorithm, based on receiving by " data directory operator " Round C is calculated in the mobile object of transmissionk
Step (5): SAjThe boundary information that the strip for obtaining current DTLI from global index's data management module indexes, so After be calculated it is all with circle CkThe strip of intersection indexes, to obtain the strip index set W in PSK algorithm steps 4.
Step (6): SAjQ will be inquiredjAnd circle CkIt is sent to corresponding " the data directory calculation of strip index in set W Son ", each " data directory operator " are found in self zone by circle CkCovering and Distance query qjK nearest mobile object.And These objects are sent to SAj.If qualified mobile object quantity is less than k, by all qualified movements pair As being sent to SAj
Step (7): SAjBy the mobile object newly received and determining circle C beforekK object be compared, to obtain Inquire qjK neighbour.
Distributed-solution towards massive spatio-temporal data k NN Query proposed by the invention, reduces distribution Communication cost between physical computing nodes needed for handling space-time data k NN Query under environment, can be to large-scale concurrent k NN Query carries out real-time response, significantly improves search efficiency.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (10)

1. a kind of Distributed Computing Platform towards space-time data k NN Query characterized by comprising
Data access distribution module is used to access the space-time data and spatiotemporal data warehouse continuously reached in real time, according to distribution Space-time data and spatiotemporal data warehouse are distributed to data cache module by formula dynamic two-stage index structure respectively;
Distributed dynamic two-stage index structure includes first order strip index and the second level grid index based on strip index, institute First order strip index is stated to be constituted by dividing space-time data along the x-axis direction;The second level grid index is to every The space-time data of one strip index is divided along y-axis and is constituted;
Data cache module is used to space-time data and spatiotemporal data warehouse that data cached access distribution module is sent;
Space-time data index module establishes index, and real-time update space-time to the space-time data in each strip index region respectively The location information of data;The space-time data index module also monitor in real time the space-time data that the data cache module reaches and Then spatiotemporal data warehouse is obtained from the space-time data for being in reason and inquiry;
Global index's data management module safeguards that the boundary information of the strip index of a distributed dynamic two-stage index structure is made For global index's data, and rope is carried out with data access distribution module, space-time data index module and query parallel processing module Draw data interaction;
Query parallel processing module requests space-time data k NN Query to carry out distributed treatment.
2. a kind of Distributed Computing Platform towards space-time data k NN Query as described in claim 1, which is characterized in that The data access distribution module, the data access operator being distributed on different physical computing nodes by several form, each Data access operator is a logic computing unit.
3. a kind of Distributed Computing Platform towards space-time data k NN Query as described in claim 1, which is characterized in that The space-time data index module, the data directory operator being distributed on different physical computing nodes by several form, each Data access operator is a logic computing unit.
4. a kind of Distributed Computing Platform towards space-time data k NN Query as described in claim 1, which is characterized in that The query parallel processing module, the data query operator being distributed on different physical computing nodes by several form, each Data query operator is a logic computing unit.
5. a kind of Distributed Computing Platform towards space-time data k NN Query as claimed in claim 4, which is characterized in that Data interaction is realized by way of sending and receiving event between the operator.
6. a kind of Distributed Computing Platform towards space-time data k NN Query as claimed in claim 5, which is characterized in that The event is one<key, value>data pair, and each operator will specify it to be received according to the title of event and key value Event.
7. a kind of space-time data distribution k NN Query based on the Distributed Computing Platform as described in claim 1-6 is any Method, which comprises the following steps:
Step (1): Distributed Computing Platform of the building towards space-time data k NN Query;
Step (2): distributed dynamic two-stage index structure DTLI is disposed on Distributed Computing Platform;
Step (3): it is based on distributed dynamic two-stage index structure DTLI, the massive spatio-temporal data continuously reached is handled;
Step (4): it is based on distributed index structure, by space-time data distribution k NN Query algorithm, i.e. PSK algorithm is deployed to On Distributed Computing Platform, the parallelization of PSK algorithm is realized, and then realize the parallel place to massive spatio-temporal data k NN Query Reason.
8. querying method as claimed in claim 7, which is characterized in that the detailed process of the step (2), comprising:
Step (2.1): each of data access distribution module data access operator safeguards the strip index of portion DTLI Boundary information, global index's data management module also safeguards the boundary information of the strip index of portion DTLI, but record is not appointed The location information of what space-time data;
Step (2.2): each data directory operator in space-time data index module is responsible for one strip index of maintenance, and to this The position of space-time data in strip index is stored and is updated;Each data directory operator is indexed in the strip of self maintained On construct second level grid index, distributed dynamic two-stage index structure is deployed in Distributed Computing Platform to realize On.
9. querying method as claimed in claim 8, which is characterized in that in the step (2.2), if there is space-time data rope Draw the data directory operator in module to be responsible for the strip index generation division of maintenance or merge, the data directory operator is real When the boundary information that changed strip indexes is written to global index's data management module.
10. querying method as claimed in claim 7, which is characterized in that the detailed process of the step (3), comprising:
Step (3.1): each of data access distribution module data access operator concurrently accesses the space-time data of arrival, And corresponding data directory operator is distributed for space-time data according to the strip of DTLI index;
Step (3.2): space-time data is sent to data cache module by the data access operator in data access distribution module, number Continue the space-time data reached in monitored data cache module according to each data directory operator in index module, from data buffer storage The space-time data that itself should be handled is obtained in module in real time.
CN201610259255.9A 2016-04-25 2016-04-25 Distributed Computing Platform and querying method towards space-time data k NN Query Expired - Fee Related CN105893605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610259255.9A CN105893605B (en) 2016-04-25 2016-04-25 Distributed Computing Platform and querying method towards space-time data k NN Query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610259255.9A CN105893605B (en) 2016-04-25 2016-04-25 Distributed Computing Platform and querying method towards space-time data k NN Query

Publications (2)

Publication Number Publication Date
CN105893605A CN105893605A (en) 2016-08-24
CN105893605B true CN105893605B (en) 2019-02-22

Family

ID=56704558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610259255.9A Expired - Fee Related CN105893605B (en) 2016-04-25 2016-04-25 Distributed Computing Platform and querying method towards space-time data k NN Query

Country Status (1)

Country Link
CN (1) CN105893605B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959352A (en) * 2018-04-27 2018-12-07 北京天机数测数据科技有限公司 Time-space data analysis platform and processing method based on time and Spatial Data Model
US11068491B2 (en) 2018-11-28 2021-07-20 The Toronto-Dominion Bank Data storage using a bi-temporal index
CN112463814A (en) * 2019-09-06 2021-03-09 阿里巴巴集团控股有限公司 Data query method and device
CN110990665B (en) * 2019-12-11 2023-08-25 北京明略软件系统有限公司 Data processing method, device, system, electronic equipment and storage medium
CN111934958B (en) * 2020-07-29 2022-03-29 深圳市高德信通信股份有限公司 IDC resource scheduling service management platform
CN112699173A (en) * 2021-01-08 2021-04-23 哈尔滨航天恒星数据系统科技有限公司 Spark-based distributed spatio-temporal object proximity query method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827004A (en) * 2010-05-12 2010-09-08 中国人民解放军国防科学技术大学 Large-scale network environment-oriented distribution-type K neighbor node searching method
CN102693293A (en) * 2012-05-15 2012-09-26 清华大学 Range query method and system for multivariable spatio-temporal data
CN105138607A (en) * 2015-08-03 2015-12-09 山东省科学院情报研究所 Hybrid granularity distributional memory grid index-based KNN query method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324642B (en) * 2012-03-23 2016-12-14 日电(中国)有限公司 System and method and the data query method of index is set up for data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827004A (en) * 2010-05-12 2010-09-08 中国人民解放军国防科学技术大学 Large-scale network environment-oriented distribution-type K neighbor node searching method
CN102693293A (en) * 2012-05-15 2012-09-26 清华大学 Range query method and system for multivariable spatio-temporal data
CN105138607A (en) * 2015-08-03 2015-12-09 山东省科学院情报研究所 Hybrid granularity distributional memory grid index-based KNN query method

Also Published As

Publication number Publication date
CN105893605A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
CN105893605B (en) Distributed Computing Platform and querying method towards space-time data k NN Query
CN106528773B (en) Map computing system and method based on Spark platform supporting spatial data management
CN110147377A (en) General polling algorithm based on secondary index under extensive spatial data environment
CN108920552A (en) A kind of distributed index method towards multi-source high amount of traffic
CN110287391A (en) Multi-level trajectory data storage method, storage medium and terminal based on Hadoop
CN106599190A (en) Dynamic Skyline query method based on cloud computing
CN108460072A (en) With electricity consumption data retrieval method and system
CN112163827A (en) Satellite remote measurement intelligent service system
CN106599189A (en) Dynamic Skyline inquiry device based on cloud computing
Jošilo et al. Distributed algorithms for content placement in hierarchical cache networks
CN101477561B (en) Large-scale space vector data management method based on content access network
CN105306547A (en) Data placing and node scheduling method for increasing energy efficiency of cloud computing system
Deng et al. Spatial-keyword skyline publish/subscribe query processing over distributed sliding window streaming data
CN115470510A (en) Construction method and device of self-adaptive Cartesian grid data structure
CN105447132A (en) Four-layer geographic data storage system oriented to Internet of Things application
CN108614889A (en) Mobile object Continuous k-nearest Neighbor based on mixed Gauss model and system
CN109446294B (en) Parallel mutual subspace Skyline query method
Rslan et al. Spatial R-tree index based on grid division for query processing
CN115018322A (en) Intelligent crowdsourcing task allocation method and system
Chen et al. Efficient historical query in HBase for spatio-temporal decision support
Ding et al. RDB-KV: A cloud database framework for managing massive heterogeneous sensor stream data
Aly et al. A demonstration of aqwa: Adaptive query-workload-aware partitioning of big spatial data
CN114090275A (en) Data processing method and device and electronic equipment
Chatzimilioudis et al. Crowdsourcing emergency data in non-operational cellular networks
Dong et al. IoT search method for entity based on advanced density clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190222

CF01 Termination of patent right due to non-payment of annual fee