CN107220285B - Space-time index construction method for massive trajectory point data - Google Patents

Space-time index construction method for massive trajectory point data Download PDF

Info

Publication number
CN107220285B
CN107220285B CN201710270989.1A CN201710270989A CN107220285B CN 107220285 B CN107220285 B CN 107220285B CN 201710270989 A CN201710270989 A CN 201710270989A CN 107220285 B CN107220285 B CN 107220285B
Authority
CN
China
Prior art keywords
point data
index
space
track point
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710270989.1A
Other languages
Chinese (zh)
Other versions
CN107220285A (en
Inventor
陈昭
王磊
刁博宇
徐勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201710270989.1A priority Critical patent/CN107220285B/en
Publication of CN107220285A publication Critical patent/CN107220285A/en
Application granted granted Critical
Publication of CN107220285B publication Critical patent/CN107220285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a parallel space-time index construction method facing mass track point data, which takes track point data files as index units, reduces the storage consumption of indexes and ensures that an index structure has high expandability; meanwhile, the Hilbert curve is used for dividing the data file, compared with other multi-dimensional to one-dimensional mapping modes, the Hilbert curve has better dividing effect due to excellent space filling characteristics, and the probability of data inclination can be reduced.

Description

Space-time index construction method for massive trajectory point data
Technical Field
The invention relates to the field of information retrieval, in particular to a space-time index construction method for massive trajectory point data.
Background
With the development of science and technology, the world has entered the big data era nowadays. Due to the rapid increase of the data scale, big data needs to have global expressive force, and space-time big data becomes one of important big data because the space-time big data can embody the incidence relation among time, space and objects. However, the relatively complex relationship between big spatio-temporal data and its dynamic evolution also bring the difficulty of searching query. The trajectory point data belongs to space-time big data, and specifically refers to data information obtained by sampling the motion process of a moving object in a space-time environment. In recent years, with the rapid development of satellites, wireless networks and positioning devices, the trajectory point data of a large number of moving objects tends to increase rapidly, and the index construction and optimization query of the trajectory point data become popular researches in recent years.
Hadoop is a popular distributed computing framework at present, is suitable for computing processing scenes of various large-scale data, has a wide application foundation, and currently, some space-time index methods proposed based on the framework and ecological software thereof, such as Q tree space-time index based on HBase, grid R tree mixed space-time index based on HBase and the like, are provided. Most of the existing space-time index construction methods use data recording strips as index units, and the mode causes large storage consumption and low index construction efficiency, and cannot meet the requirement of rapid increase of space-time big data of different types.
Disclosure of Invention
The invention aims to provide a space-time index construction method facing mass track point data, which can overcome the defects of the prior art and can parallelly construct space-time indexes of the track point data in a distributed environment with higher efficiency; and the data file is used as an index unit, so that the index structure has flexible expansibility.
The technical scheme adopted by the invention is as follows: a space-time index construction method for massive trajectory point data comprises the following steps:
step 1), storing track point data in a track point data file;
and step 2), constructing an index tree by taking the track point data file in the step 1) as an index unit.
Preferably, the trace point data in step 1) at least includes time information and two-dimensional position information.
Preferably, the step 2) further comprises:
step 21), dividing the track point data file into at least one computing unit;
step 22), the computing unit constructs a space-time index based on the space index structure.
Preferably, when the computing unit is a plurality of parallel computing units, the track point data file is divided into ordered partitions in step 21).
Preferably, the ordered division of step 21) is implemented by using a space-filling curve.
Preferably, the space-filling curve is a hilbert curve.
Preferably, the step 21) further comprises:
step 211) calculating a two-dimensional Hilbert value of two-dimensional space information for representing the track point data file;
step 212) calculating a three-dimensional Hilbert value used for representing the three-dimensional space information of the track point data file according to the two-dimensional Hilbert value calculated in the step 211);
step 213) dividing the track point data file according to the three-dimensional Hilbert value calculated in the step 212).
Preferably, the spatial index structure in step 22) is an R-tree structure.
Preferably, the construction of the multi-level spatiotemporal index tree can be realized based on a MapReduce or Spark programming framework.
According to another aspect of the present invention, a method for querying trajectory point data based on the index tree constructed by the above method is provided, including:
step a), traversing the root nodes of the index tree to obtain a root node list;
step b), inquiring the root node list obtained in the step a) to obtain a child node list;
and c) traversing the child node list obtained in the step b) in parallel to obtain a track point data file list.
Preferably, the query method can be implemented based on a MapReduce or Spark programming framework.
Has the advantages that: according to the time-space index construction method facing the mass track point data, the data file containing the motion information is used as the index unit, the storage consumption of the index is reduced, and the storage mode of the data file can be adjusted according to the requirement, so that the index structure has high expandability; meanwhile, the Hilbert curve is used for dividing the data file, compared with other multi-dimensional to one-dimensional mapping modes, such as mapping longitude and latitude into grid numbers, the Hilbert curve has a better dividing effect due to excellent space filling characteristics, and the probability of data inclination can be reduced.
Drawings
FIG. 1 is a schematic view of a spatiotemporal cube of a trace point data file according to the present invention
FIG. 2 is a schematic diagram of an index tree structure for massive trace point data according to the present invention
FIG. 3 is a multi-level spatiotemporal index tree parallel construction process based on R tree implemented by using MapReduce programming framework,
FIG. 4 is a parallel query process of the spatio-temporal index tree constructed based on FIG. 3, which is implemented by adopting a MapReduce programming framework
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the spatio-temporal index construction method for massive trajectory point data provided in the embodiments of the present invention is further described in detail below with reference to the accompanying drawings.
The trace point data is a series of data records obtained by sampling the real-time position of the moving object, and each record comprises a sampling time stamp, two-dimensional position information, other motion information such as speed and direction and other related information such as sampling time. The trace point data is structured data, the number of data columns is fixed, and the meaning, the type and the value range of each column are also fixed; meanwhile, once the trace point data collected by sampling is generated, the trace point data can not be modified and deleted any more, and the trace point data can be regularly accumulated in batches in the process of switching from online acquisition to offline analysis.
The inventor finds that the track point data can be stored and organized into a plurality of data files according to the characteristics of the track point data, and the track point data can be subjected to space-time index by taking the data file stored with the track point data record as a unit.
When recording track point data, each track point data record can be abstracted into an n-dimensional vector, and the ith record can be described as:
(ti,xi,yi,oi1,...,oin-3)
wherein, tiFor time stamp at sampling time, xi、yiFor moving objects at tiTwo-dimensional position information (typically latitude and longitude), o, of a time of dayi1,...,oin-3Is the information of the rest n-3 dimensions.
FIG. 1 shows a schematic diagram of a spatiotemporal cube of a track point data file of the spatiotemporal index construction method for massive track point data provided by the present invention. As shown in fig. 1, a plurality of trace point data are stored as a trace point data file, the trace point data contained in each trace point data file has a space-time three-dimensional value range, and three vertexes representing the cube range are respectively:
(tmin,Xmin,ymin)、(tmin,Xmax,ymax)、(tmax,Xmax,ymax)
the storage mode of the track point data can be various according to the requirement, for example, the track point data is stored according to ascending/descending order of time, the track point data is stored according to spatial area grids, or the track point data in a single spatial area grid is stored according to ascending/descending order of time, and the like. The following will describe a process of constructing a spatiotemporal index of trajectory point data by taking a trajectory point data file stored in ascending order of time as an example.
According to one embodiment of the invention, a space-time index construction method for massive trajectory point data is provided. The method comprises two steps of dividing a track point data file and constructing an index tree, and comprises the following specific contents:
s10, dividing the track point data file into at least one computing unit;
according to the number of the track point data files, when the index tree is constructed, the track point data files can be divided into a single computing unit to be executed or distributed to a plurality of computing units to be executed in parallel, so that the processing speed and efficiency are improved. Taking the division into w computing units as an example, distributing all track point data files to the w computing units, traversing all track point data records for each track point data file by the computing units, and counting to obtain the time and space value range of the record contained in the track point data file:
(tmin,tmax)、(Xmax,Xmin)、(ymax,ymin)
and taking the central point of the space-time cube of the track point data file as an identifier, wherein each track point data file can be characterized as a three-dimensional coordinate:
((tmin+tmax)/2,(Xmin+Xmax)/2,(ymin+ymax)/2)
according to an embodiment of the invention, when multiple computing units are adopted for parallel execution, in order to reduce the parallel traversal overhead during query and improve the indexing performance, a space filling curve can be used for computing the similarity degree of a central point in a three-dimensional space, and track point data files of similar time and similar geographic spaces are divided into one computing unit.
A hilbert curve is a space filling curve, which can map points in a two-dimensional space into one-dimensional values, i.e., hilbert values, and tuples in two-dimensional spaces with similar hilbert values often have similar properties in the two-dimensional space. The detailed description will be given below by taking a hilbert curve as an example, and the specific steps are as follows:
and S101, taking the central point of the space-time cube of the track point data file as a vector for identifying the track data file. Let the identification vector of the ith track point data file be (t'i,x′i,y′i) First, calculate (x'i,y′i) Two-dimensional Hilbert value of
Figure BDA0001277375890000041
For representing the position information of the track point data file in two-dimensional space and then calculating
Figure BDA0001277375890000042
Three-dimensional Hilbert value of
Figure BDA0001277375890000043
And the method is used for representing the position information of the track point data file in the three-dimensional space, and the position information is used as the Hilbert value of the file. Is recorded as
Figure BDA0001277375890000051
And (5) vector quantity.
And S102, sampling all track point data files according to a sampling rate p. Assuming that the number of samples is m, the number of preset parallel computing units is w, and Hilbert values representing m data files are arranged in an ascending order
Figure BDA0001277375890000052
Dividing m samples into w sets approximately uniformly, and taking the maximum Hilbert in each setThe value (divided by the last set) is taken as the division point, i.e. it is
Figure BDA0001277375890000053
And w-1 in total. The sampling rate is in the range of p e (0, 1)]The larger the p value is, the closer the sampling result is to the distribution of the real data file, and the better the dividing effect is; the smaller the p-value, the faster the running speed.
S103, traversing all track point data files, and judging the Hilbert value of the ith data fileThe location of the interval is determined by the position of the target,
if it is
Figure BDA0001277375890000055
Dividing the ith file into 1 st computing unit;
if it is
Figure BDA0001277375890000056
Dividing the ith file into the jth computing unit;
if it is
Figure BDA0001277375890000057
The ith file is divided into the w-th cells.
The Hilbert curve has good space filling characteristic, and when the track point data file is divided, the probability of data inclination can be reduced. Although in the above embodiments, the track point data file is divided by using a hilbert curve, it should be understood by those skilled in the art that in other embodiments, the track point data file may be divided in order by using a plurality of dividing manners, such as by using a chronological order or using other types of space filling curves.
S20, constructing a multi-level space-time index tree based on the R-tree;
r-trees are a variant of spatial index structure R-trees. The R-tree is a highly balanced tree in which the B-tree is expanded in a multidimensional space. Compared with the R tree, the R tree has not much change in structure, and the difference is mainly that the overlap is considered in the insertion of the index tree, and the R tree selectively re-inserts the unit inserted in the index first, so that the index tree is optimized.
After step S10 is completed, the computing unit takes the assigned track point data file as an index unit, and constructs a multi-level spatio-temporal index tree of the computing unit in parallel based on the R × tree.
Fig. 2 is a schematic diagram of an index tree structure provided by the present invention, and as shown in fig. 2, a construction process of the index tree is the same as a basic R-tree construction process. The index tree uses the storage path of each track point data file and the range of the track point data file, namely the space-time value range (t) of the track point data filemin,tmuax)、(xmax,xmin)、(ymax,ymin) As an index unit file. The method comprises the following specific steps:
s201, leaf node construction: each leaf node comprises at least one index unit and a minimum space-time rectangle which can frame all the index units;
the minimum space-time rectangle refers to the space-time value range of all track point data files contained in the minimum space-time rectangle, and the following is the same.
S202, constructing non-leaf nodes: each non-leaf node comprises a pointer array of its child nodes and a minimum spatio-temporal rectangle that can frame all its child nodes;
s203, constructing a root node of the index sub-tree: the index subtree root node on each computing unit comprises a pointer array of the subtree root node and a minimum space-time rectangle which can frame all the subnodes of the root node, and if the index subtree root node is a leaf node, the index subtree root node comprises the space-time value range of all track point data files on the computing unit;
s204, constructing a root node of the index tree: each index tree root node comprises the recording paths of the trace point data files on all the computing units and the minimum space-time rectangle which can frame all the child nodes of the root node.
S205, constructing a total index tree file: because the total amount of the track point data or the accumulation time is different, the construction of the index tree is generally required to be executed in batches, and the index tree root nodes generated by each construction can be stored in an ascending order according to the time range contained by the index tree root nodes, namely, the index tree root nodes are the total index tree file of the highest level.
Compared with the traditional method of taking each data record as an index unit, the method of taking the track point data file as the index unit provided by the invention greatly reduces the construction complexity of the index tree, improves the running speed, saves the system overhead, can obviously improve the track point data management efficiency and the query efficiency, and can meet the requirement of continuously accumulating track point data.
The track point data file is internally stored with track point data, namely the size of the track point data file influences the depth of constructing the index tree, and the larger the track point data file is, the smaller the depth of the index tree is, and the faster the running speed is; and correspondingly, the smaller the track point data file is, the greater the index tree depth is, and the higher the query accuracy is.
In another embodiment of the present invention, the construction of the index tree may be implemented based on a MapReduce programming model, taking a parallel computing unit as an example, and fig. 3 shows a flowchart of the construction of the index tree implemented based on a MapReduce programming framework, as shown in fig. 3, the specific flow is as follows:
step 101: uniformly distributing the track point data files to parallel computing units, wherein each parallel computing unit stores the storage path p of the track point data files each timeiAs Map end input, traversing all records in the trace point data file, and counting the time-space value range (t) of the trace point data contained in the data point fileimin,timax)、(ximax,ximin)、(yimax,yimin);
Step 102: after the statistics is completed, the parallel computing unit forms the storage path of the trace point data file and the space-time value range of the trace point data file in the step 101 into a tuple (p)i,(timin,timax,ximax,ximin,yimax,yimin) Output as Map terminal;
step 103: the parallel computing unit takes the tuples of all the track point data files in the step 102 as the input of the Reduce end and the output of the Map end and stores the tuples as the index record files of the track point data files, namely the index units when the index trees are constructed.
Step 201: traversing all track point data files by the parallel computing unit, and taking the tuple in the step 102 as the input of the Map end; the center point of the spatio-temporal cube ((t) for each trace point data file i)imin+timax)/2,(ximin+xmax)/2,(yimin+yimax) And/2) as the mark of the track point data file, setting the mark as (t)f,xf,yf) (ii) a Calculating Hilbert value h of the data file according to the central pointts: let H (x, y) be the original Hilbert function, round (x) be a rounded function, and the Hilbert value of the data file calculate function HtsIs defined as:
hts=Hts(T,X,Y)=
H(round(aT),round(bH(round(x),round(y))))
and a and b are adjusting parameters calculated according to requirements and used for optimizing and calculating the Hilbert value. In the calculation process, a [0, 1 ] is generated simultaneously]R is less than or equal to 0.1, (h)tsAnd 1) the output is used as the output of the Map end, otherwise, the output is not generated, and therefore sampling is achieved. H to be obtainedtsThe values are arranged in ascending order, and a specific h is taken according to the requirementtsThe value is taken as a division point (same as step S102);
step 202: the parallel computing unit traverses all track point data files, takes the tuple in each step 102 as the input of a Map end, and computes the Hilbert value of the track point data files
Figure BDA0001277375890000075
Dividing the track point data file according to the step S103, and outputting each calculation unit obtained by dividing in the step S103 as a Map end;
step 203: the parallel computing unit respectively stores the Map end output in the step 202, namely the Reduce end input, as an index record file of the track point data file according to the divided parallel computing unit numbers, and each row of records are (p)i,timin,timax,ximax,ximin,yimax,yimin);
Step 301: each parallel computing unit respectively constructs an index subtree (the same as the steps S201-S203);
step 401: adding each index subtree into index subtree file for storing index subtree root node, the index tree root node is recorded as
Figure BDA0001277375890000071
Wherein the content of the first and second substances,
Figure BDA0001277375890000072
representing the total time value of the batch of track point data;
Figure BDA0001277375890000073
representing a file path of a root node of a storage index sub-tree;indicating the offset of the index subtree root node record in the file (same as step S204);
step 501: the index tree root records of the batch are added to the index tree file storing the existing index tree root nodes, and then all the index tree root nodes are arranged in ascending order according to the maximum value of the time range (same as step S205).
The inventor researches, taking a track point data set with the size of 1TB as an example, and respectively stores the track point data into a plurality of track point data files with the same size as the default value of the HDFS block, namely 128 MB. By adopting the method provided by the invention, the spatio-temporal index tree is constructed aiming at the track point data file, and about 9000 index records can be generated in total. The method not only greatly reduces the scale of the data participating in the space-time index construction, but also can improve the construction speed.
In another embodiment of the invention, a method for performing parallel query on a large amount of trace point data based on the index tree constructed above is also provided. For example, the spatio-temporal value ranges of a large amount of trajectory point data to be queried are:
{t∈[tmin,tmax]∩ x∈[Xmin,Xmax]∩y∈[ymin,ymax]}
wherein t is a time value condition, and x and y are two-dimensional geographic space value conditions. The specific query steps are as follows:
s30, traversing the root nodes of all the index trees and comparing the time range with tmin,tmax]Adding the root nodes of the index tree with the intersection into a root node list with query;
s40 according to tmainA/2, the root node list obtained in the step A is searched in two ways, and index sub-tree root nodes of the index tree root nodes with intersection in the space-time value range are added into the list of the sub-nodes to be traversed in parallel;
s50, parallelly traversing each child node of the root node of the index child tree in the child node list obtained in the step B, traversing all child nodes of the node if the node is a non-leaf node, and adding the child nodes with intersection in the space-time value range into the child node list to be parallelly traversed; if the node is a leaf node, traversing all records of the node, and adding a data file path with intersection between the space-time value range and the space-time value range of the query condition into a file list to be queried;
and S60, when the child node list to be traversed becomes empty, further inquiring the required track point data in the file list to be inquired, namely the track point data file set containing all the track points to be inquired.
In another embodiment of the present invention, the query method may be implemented based on a MapReduce programming model, and fig. 4 shows a flowchart of the query method implemented based on a MapReduce programming framework, as shown in fig. 4, the specific flow is as follows:
step 101': read the index tree file, line i records as
Figure BDA0001277375890000081
Suppose that
Figure BDA0001277375890000082
And
Figure BDA0001277375890000083
if there is an intersection, then will
Figure BDA0001277375890000084
Put into a queue Q of root nodes to be traversed0In (1). (same as step S30)
Step 102': reading the index sub-tree file to obtain all the root node queues Q to be traversed in step 1010The index sub-tree root node of the root node performs overlapping judgment on the index sub-tree root node and the query condition, and the index sub-tree root node meeting the overlapping condition is placed into a node queue Q to be traversednodeIn (1).
The above-mentioned overlap judgment means that the space-time value range of the index sub-tree root node E obtained in step 101 is used as the reference
Figure BDA0001277375890000091
With overlapping parts of spatio-temporal rectangles formed by query conditions, i.e.
Figure BDA0001277375890000092
And is
Figure BDA0001277375890000093
And is
Figure BDA0001277375890000094
Step 201': with QnodeThe root node of the index sub-tree in (1) is used as Map input, and for each root node E of the index sub-tree, if E is a non-leaf node, the child node of E is
Figure BDA0001277375890000095
Where m is the number of child nodes contained in E, then
Figure BDA0001277375890000096
Outputting as a Map end; if E is a leaf node, E contains an index unit of
Figure BDA0001277375890000097
Then will be
Figure BDA0001277375890000098
And outputting the signal as the Map end.
Step 202': setting a record as a record of which the input Key value of the Reduce end is 0, namely the record of a non-leaf node
Figure BDA0001277375890000099
Suppose that
Figure BDA00012773758900000910
If there is an overlap with the spatio-temporal rectangle formed by the query conditions as defined in step 102, then it will be
Figure BDA00012773758900000911
As Reduce output; and for the record with the Key value of 1 input by the Reduce end, namely the index unit of the leaf node, executing SQL query based on the file path set of the Reduce end, and storing the obtained query result into a query temporary table.
Step 301': steps 201 'and 202' are parallel MapReduce processes, the output of the Reduce end is used as the input of the Map end after the execution is finished, and the step 201 'is re-entered until the output of the Reduce end is empty after the step 202' is finished.
Step 401': and returning the content in the query temporary table to obtain a final track point data file query result.
Through research of the inventor, the spatio-temporal index construction and query method provided by the invention only comprises projection and aggregation operations during the traversal of the nodes of the index tree, does not relate to complex operation processes such as trace point data sequencing in a single trace point data file, reduces the system overhead, improves the construction and query speed, and has higher flexibility and expansibility by adopting a mode of taking a trace point data file as a unit.
In another embodiment of the present invention, the construction and query of the index tree can be implemented based on a Spark programming framework. The RDD abstract data structure programming model provided by Spark is mainly realized based on memory operation, and can optimize iterative workload besides providing interactive query.
Although in the foregoing embodiment, an R-tree-based structure is used to construct an index tree for a mass of track point data files, those skilled in the art will understand that in other embodiments, a variety of spatial index structures may be used to implement the method for constructing an index by using a track point data file as an index unit.
Although the present invention has been described by way of preferred embodiments, the present invention is not limited to the embodiments described herein, and various changes and modifications may be made without departing from the scope of the present invention.

Claims (10)

1. A space-time index construction method for massive trajectory point data comprises the following steps:
step 1), storing track point data into a plurality of track point data files, wherein the track point data at least comprises time information and two-dimensional position information;
step 2), acquiring a space-time value range of the trace point data contained in each trace point data file;
step 3), constructing an index tree by taking the track point data file as an index unit;
wherein the index tree is constructed by the following steps:
1) constructing leaf nodes: each leaf node comprises at least one index unit and a minimum space-time rectangle which can frame all the index units; the minimum space-time rectangle refers to the space-time value range of all track point data files contained in the minimum space-time rectangle;
2) construction of non-leaf nodes: each non-leaf node comprises a pointer array of its child nodes and a minimum spatio-temporal rectangle that can frame all its child nodes;
3) constructing a root node of the index subtree: the index subtree root node on each computing unit comprises a pointer array of the subtree root node and a minimum space-time rectangle which can frame all the subnodes of the root node, and if the index subtree root node is a leaf node, the index subtree root node comprises the space-time value range of all track point data files on the computing unit;
4) constructing a root node of the index tree: each index tree root node contains the recording paths of the trace point data files on all the computing units and the minimum space-time rectangle which can frame all the child nodes of the root node.
2. The method for constructing the spatio-temporal index for the massive amounts of trajectory point data as claimed in claim 1, wherein said step 3) further comprises:
step 31), dividing the track point data file into at least one computing unit;
step 32), the computing unit constructs a space-time index based on the space index structure.
3. The method for constructing the spatio-temporal index for the mass trace point data according to claim 2, wherein when the computing unit is a plurality of parallel computing units, the track point data file is divided into the ordered partitions in the step 31).
4. The method for constructing the spatio-temporal index for the massive trace point data as claimed in claim 3, wherein the ordered division of the step 31) is realized by using a space filling curve.
5. The method for constructing the spatio-temporal index for the massive amounts of trajectory point data as claimed in claim 4, wherein the space filling curve is a Hilbert curve.
6. The method for constructing the spatio-temporal index for the massive amounts of trajectory point data as claimed in claim 5, wherein the step 31) further comprises:
step 311) calculating a two-dimensional Hilbert value of two-dimensional space information for representing the track point data file;
step 312) calculating a three-dimensional Hilbert value used for representing the three-dimensional space information of the track point data file according to the two-dimensional Hilbert value calculated in the step 311);
step 313) dividing the track point data file according to the three-dimensional Hilbert value calculated in the step 312).
7. The method for constructing spatio-temporal index facing mass trajectory point data according to any one of claims 3 to 6, wherein the spatial index structure in the step 32) is an R-tree structure.
8. The method for constructing the spatio-temporal index for the massive trajectory point data as claimed in any one of claims 3 to 6, wherein the construction of the index tree can be realized based on a MapReduce or Spark programming framework.
9. A method of querying trajectory point data using an index tree constructed as claimed in any one of claims 1 to 8, comprising:
step a), traversing the root nodes of the index tree to obtain a root node list;
step b), inquiring the root node list obtained in the step a) to obtain a child node list;
and c) traversing the child node list obtained in the step b) in parallel to obtain a track point data file list.
10. The method of claim 9, wherein the method is implemented based on MapReduce or Spark programming framework.
CN201710270989.1A 2017-04-24 2017-04-24 Space-time index construction method for massive trajectory point data Active CN107220285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710270989.1A CN107220285B (en) 2017-04-24 2017-04-24 Space-time index construction method for massive trajectory point data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710270989.1A CN107220285B (en) 2017-04-24 2017-04-24 Space-time index construction method for massive trajectory point data

Publications (2)

Publication Number Publication Date
CN107220285A CN107220285A (en) 2017-09-29
CN107220285B true CN107220285B (en) 2020-01-21

Family

ID=59944901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710270989.1A Active CN107220285B (en) 2017-04-24 2017-04-24 Space-time index construction method for massive trajectory point data

Country Status (1)

Country Link
CN (1) CN107220285B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11068491B2 (en) 2018-11-28 2021-07-20 The Toronto-Dominion Bank Data storage using a bi-temporal index
CN109889993B (en) * 2019-01-31 2021-03-16 北京永安信通科技有限公司 Method and device for determining positioning object in predetermined area and electronic equipment
CN110134692B (en) * 2019-05-17 2023-04-07 南京大学 Time-space index establishing method based on frequency attribute and PCA
CN111177431B (en) * 2019-12-18 2020-11-24 北京市水利规划设计研究院 Microstation-based digital photo management method, device, processor and storage medium
KR102358877B1 (en) * 2019-12-26 2022-02-07 한국과학기술정보연구원 Management device and management method for unified index
CN111078634B (en) * 2019-12-30 2023-07-25 中科海拓(无锡)科技有限公司 Distributed space-time data indexing method based on R tree
CN111831622A (en) * 2020-03-31 2020-10-27 北京嘀嘀无限科技发展有限公司 Data index generation method and device, electronic equipment and readable storage medium
CN112685428B (en) * 2021-03-10 2021-07-06 南京烽火星空通信发展有限公司 Space-time analysis method based on massive position trajectory data
CN112948531B (en) * 2021-04-02 2023-12-15 方正国际软件(北京)有限公司 Massive track query method, retrieval server and system
CN113312361B (en) * 2021-07-28 2022-01-25 阿里云计算有限公司 Track query method, device, equipment, storage medium and computer program product
CN116089560B (en) * 2023-03-02 2023-06-23 智道网联科技(北京)有限公司 Trace point assignment method, device, equipment and storage medium
CN116578569B (en) * 2023-07-12 2023-09-12 成都国恒空间技术工程股份有限公司 Satellite space-time track data association analysis method
CN116662419B (en) * 2023-08-01 2023-12-22 太极计算机股份有限公司 Real-time massive ship track high-performance visualization system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2835747A2 (en) * 2013-08-08 2015-02-11 Sap Se Managing and querying spatial point data in column stores
CN104750708A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Spatio-temporal data index building and searching methods, a spatio-temporal data index building and searching device and spatio-temporal data index building and searching equipment
CN103294790B (en) * 2013-05-22 2016-08-10 西北工业大学 A kind of space and time order towards GPS track data indexes and search method
CN103617162B (en) * 2013-10-14 2016-09-07 南京邮电大学 A kind of method building Hilbert R tree index in equity cloud platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294790B (en) * 2013-05-22 2016-08-10 西北工业大学 A kind of space and time order towards GPS track data indexes and search method
EP2835747A2 (en) * 2013-08-08 2015-02-11 Sap Se Managing and querying spatial point data in column stores
CN103617162B (en) * 2013-10-14 2016-09-07 南京邮电大学 A kind of method building Hilbert R tree index in equity cloud platform
CN104750708A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Spatio-temporal data index building and searching methods, a spatio-temporal data index building and searching device and spatio-temporal data index building and searching equipment

Also Published As

Publication number Publication date
CN107220285A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN107220285B (en) Space-time index construction method for massive trajectory point data
CN105589951B (en) A kind of mass remote sensing image meta-data distribution formula storage method and parallel query method
CN113946575B (en) Space-time trajectory data processing method and device, electronic equipment and storage medium
CN106933833B (en) Method for quickly querying position information based on spatial index technology
CN108446349B (en) GIS abnormal data detection method
CN102982103A (en) On-line analytical processing (OLAP) massive multidimensional data dimension storage method
CN108205562B (en) Positioning data storage and retrieval method and device for geographic information system
CN106503196B (en) The building of extensible storage index structure in cloud environment and querying method
CN103116610A (en) Vector space big data storage method based on HBase
CN106599052B (en) Apache Kylin-based data query system and method
CN106471501B (en) Data query method, data object storage method and data system
CN102609530A (en) Space database indexing method of regional double-tree structure
CN108388603B (en) Spark framework-based distributed summary data structure construction method and query method
CN111078634B (en) Distributed space-time data indexing method based on R tree
CN109783441A (en) Mass data inquiry method based on Bloom Filter
CN108549696B (en) Time series data similarity query method based on memory calculation
CN112395288B (en) R-tree index merging and updating method, device and medium based on Hilbert curve
Sarwat Interactive and scalable exploration of big spatial data--a data management perspective
Hu et al. A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data
Zhang et al. Improving NoSQL storage schema based on Z-curve for spatial vector data
CN104933143A (en) Method and device for acquiring recommended object
Tian et al. A survey of spatio-temporal big data indexing methods in distributed environment
CN104794237A (en) Web page information processing method and device
Gao et al. Optimal-location-selection query processing in spatial databases
CN113722415B (en) Point cloud data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant