WO2015096582A1 - 一种时空数据的索引建立方法、查询方法、装置及设备 - Google Patents

一种时空数据的索引建立方法、查询方法、装置及设备 Download PDF

Info

Publication number
WO2015096582A1
WO2015096582A1 PCT/CN2014/092256 CN2014092256W WO2015096582A1 WO 2015096582 A1 WO2015096582 A1 WO 2015096582A1 CN 2014092256 W CN2014092256 W CN 2014092256W WO 2015096582 A1 WO2015096582 A1 WO 2015096582A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
node
subspace
time slice
sub
Prior art date
Application number
PCT/CN2014/092256
Other languages
English (en)
French (fr)
Inventor
袁明轩
张世明
谭浩宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015096582A1 publication Critical patent/WO2015096582A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to an index establishing method, a query method, a device and a device for spatiotemporal data.
  • the user's spatio-temporal data may be data related to the time and space of the movement trajectory when the user moves in different time and space recorded by the GPS service, and for example, the mobile broadband recorded by the base station when the user uses the mobile network (MBB) , Mobile Broadband) data also contains a large number of users of spatio-temporal data.
  • the number of users is usually more than 2 million.
  • the number of key locations on the road network is usually more than 100,000.
  • the time interval for data sampling is usually several seconds to several minutes. Therefore, the amount of spatio-temporal data generated will reach the order of PB. Even EB. Then, when searching for these spatiotemporal data, how to reasonably establish an index, so that the required spatio-temporal data can be quickly found becomes an urgent problem to be solved.
  • the index establishment for spatio-temporal data mainly includes the following steps:
  • Step 1 Divide the space to be divided into multiple subspaces.
  • Step 2 Convert the two-dimensional space to be divided in step 1 into one-dimensional coded data.
  • the plurality of subspaces divided in step 1 are encoded in a certain order (for example, z-curve).
  • the coding strategy is to try to make the coding of adjacent subspaces adjacent in position as close as possible.
  • adjacent subspace codes there is no guarantee that adjacent subspace codes will be adjacent.
  • Step 3 The spatially balanced index tree is established by using the one-dimensional coded data obtained in step 2 according to a traditional balanced tree index method, wherein each leaf node records the storage of spatio-temporal data related to the subspace corresponding to the leaf node code. position.
  • Step 4 The time-balanced index tree is established according to the traditional balanced tree index and the like, wherein each leaf node records the storage location of the time-related spatio-temporal data corresponding to the leaf node.
  • the index of the spatiotemporal trajectory data is established.
  • the main steps include the following steps:
  • Step 1 According to the spatial query condition input by the user, all the codes corresponding to the sub-spaces satisfying the spatial query condition are determined. The node corresponding to the determined code is searched from the spatially balanced index tree, thereby obtaining a storage location of spatio-temporal data that satisfies the spatial query space.
  • the spatiotemporal data found in this step may have redundant data.
  • Step 2 According to the time query condition input by the user, the corresponding node is searched from the time-balanced index tree, thereby obtaining a storage location of the spatio-temporal data that satisfies the time query space.
  • Step 3 From the spatiotemporal data obtained in step 1 and the spatiotemporal data obtained in step 2, spatiotemporal data having an intersection is determined and used as a query result.
  • the method for establishing spatiotemporal data index in the prior art is that the space coding needs to be first calculated when searching for spatiotemporal data, and the corresponding spatiotemporal data is searched according to time and space respectively, and then the found spatiotemporal data is summarized to obtain a search result. This is subject to secondary indexing, which reduces the efficiency of the search.
  • the embodiment of the invention provides an index creation method, a query method, a device and a device for time and space data, which are used to solve the problem of low efficiency when searching for spatiotemporal data based on the spatiotemporal data index established in the prior art.
  • the first aspect provides a method for establishing an index of spatiotemporal data, including:
  • a multi-level time index tree as a root node by using a preset time range; wherein the multi-level time index tree includes multiple time slice nodes, and the time slice represented by the time slice node closer to the root node is longer;
  • a multi-level spatial index tree as a root node by using a preset spatial range; wherein the multi-level spatial index tree includes multiple sub-space nodes, and the sub-spaces represented by the sub-space nodes closer to the root node are larger;
  • Each time slice leaf node and each subspace leaf node are respectively mapped with a spatiotemporal file; wherein the spatiotemporal file is used to store a time slice and a subspace leaf node representation characterized by a time slice leaf node having a mapping relationship with the spatiotemporal file.
  • Space-time data corresponding to the subspace are respectively mapped with a spatiotemporal file; wherein the spatiotemporal file is used to store a time slice and a subspace leaf node representation characterized by a time slice leaf node having a mapping relationship with the spatiotemporal file.
  • each time slice leaf node and each subspace leaf node are respectively mapped into a spatiotemporal file, which specifically includes: respectively determining an identifier of each time slice leaf node, and each The identifier of the leaf node of the subspace; the preset hash algorithm is used to generate a hash result of the identifier of the leaf node of each time slice and the identifier of the leaf node of each subspace; and the obtained hash result is determined as the identifier of the space-time file
  • the space-time file is a time slice represented by a time slice leaf node corresponding to the hash result, and a space-time file mapped by the subspace leaf node represented by the subspace leaf node.
  • the method further includes: storing the spatiotemporal data to be stored in the spatiotemporal file.
  • storing the spatiotemporal data to be stored in the spatiotemporal file includes: a preset sorting manner based on a preset category, Determining a storage order of the spatiotemporal data to be stored; according to the determined storage order, the spatiotemporal data to be stored in the spatio-temporal file belonging to the same category and having the same data format is encoded and compressed by the first algorithm; The spatiotemporal data of the spatio-temporal file belonging to the same category and having the same value to be stored is encoded and compressed by the second algorithm, and the stored spatiotemporal data to be stored conforms to the preset distributed query operation structure.
  • the preset time range is used as a root node, and the multi-level time index tree is generated by using the following method:
  • the preset time range is divided into a preset number of sub-time slices of the same length, and the following steps are performed cyclically until the currently obtained sub-time slice length is equal to the time slice length represented by the time slice leaf node: the currently obtained sub-segment
  • the time slice is a child node of the time slice node corresponding to the parent time slice of the sub-time slice; and the currently obtained sub-time slice is further divided into long a preset number of sub-time slices of the same degree; or dividing the preset time range into sub-time slices of different lengths according to the distribution of the spatio-temporal data generated within the preset time range according to the preset time range
  • the preset spatial extent is used as a root node, and the multi-level spatial index tree is generated by using the following method:
  • the preset spatial range is divided into a preset number of subspaces of the same length, and the following steps are performed cyclically until the currently obtained subspace size is equal to the subspace size represented by the leaf nodes of the subspace: the currently obtained subspace is taken as a sub-node of the subspace node corresponding to the parent space of the subspace; and further dividing the currently obtained subspace into a preset number of subspaces of the same length; or generating spatiotemporal data according to the preset spatial range Distributing the preset spatial range into subspaces of different sizes, and performing the following steps cyclically until the currently obtained subspace size conforms to the subspace represented by the leaf node of the subspace Size up to: the currently obtained subspace as the subsection of the subspace node corresponding to the parent
  • a method for querying spatiotemporal data including:
  • the query result is determined in the spatio-temporal data stored in the spatio-temporal file mapped by each time slice leaf node and each sub-space leaf node.
  • determining the query result in the spatio-temporal data stored in the spatio-temporal file mapped by each time slice leaf node and each subspace leaf node specifically: determining separately Determining the identifier of each time slice leaf node and the identifier of each subspace leaf node; using a preset hash algorithm, generating a hash of the identifier of each time slice leaf node and the identifier of each subspace leaf node a result; determining, according to the generated hash result, a storage location of the spatiotemporal file identified by the hash result; and determining a query result in the spatiotemporal data stored in the spatiotemporal file determined according to the storage location.
  • the query condition further includes the time query condition and the spatial query condition And the other query conditions
  • the data stored in the spatio-temporal file conforms to the preset distributed query operation structure
  • the query result is determined in the spatio-temporal data stored in the spatio-temporal file, and specifically includes: spatio-temporal data stored according to the spatio-temporal file
  • the amount of data, and the preset distributed query operation structure starts a corresponding number of parsing processes for the spatiotemporal file; and according to the other query conditions, the corresponding number of parsing processes are stored in parallel in the spatiotemporal file
  • the spatio-temporal data is parsed to obtain an analysis result that meets the other query conditions; the obtained parsing result is summarized and determined as a query result.
  • the third aspect provides an index establishing apparatus for spatiotemporal data, including:
  • a first generation module configured to generate, by using a preset time range as a root node, a multi-level time index tree; wherein the multi-level time index tree includes multiple time slice nodes, and the time zone node near the root node is characterized by time The longer the film
  • a second generation module configured to generate a multi-level spatial index tree by using a preset spatial range as a root node, where the multi-level spatial index tree includes multiple sub-space nodes, and the closer to the sub-space represented by the sub-node of the root node Bigger
  • mapping module configured to respectively generate each of the multi-level time index trees generated by the first generation module
  • the time slice leaf node maps a space-time file to each of the child space leaf nodes of the multi-level spatial index tree generated by the second generation module; wherein the space-time file is used to store a time slice leaf node having a mapping relationship with the space-time file The time slice and the subspace corresponding to the subspace represented by the subspace leaf node.
  • the mapping module is specifically configured to determine an identifier of a leaf node of each time slice and an identifier of a leaf node of each subspace respectively; using a preset hash algorithm, Generating a hash result of the identifier of each leaf slice node and the identifier of each child space leaf node; and determining the obtained hash result as an identifier of the space-time file, wherein the space-time file is corresponding to the hash result
  • the apparatus further includes: a storage module; the storage module, configured to send to the space-time file Stores spatio-temporal data to be stored.
  • the storage module is specifically configured to determine the spatiotemporal data to be stored according to a preset sorting manner of a preset category.
  • the storage order of the space-time data of the spatio-temporal file belonging to the same category and having the same data format according to the determined storage order is encoded and compressed by the first algorithm; and the space-time file belongs to the same category
  • the spatio-temporal data to be stored having the same value is encoded and compressed by the second algorithm, and the stored spatiotemporal data to be stored conforms to the preset distributed query operation structure.
  • the first generating module is specifically configured to use a preset time range as a root node, and adopt the following The method generates a multi-level time index tree: the preset time range is divided into a preset number of sub-time slices of the same length, and the following steps are performed cyclically until the currently obtained sub-time slice length is equal to the time slice leaf node representation
  • the time slice length is as follows: the currently obtained sub-time slice is used as a child node of the time slice node corresponding to the parent time slice of the sub-time slice; and the currently obtained sub-time slice is further divided into a preset number of the same length Time slice; or according to The preset time range is divided into sub-time slices of different lengths and shorts within a preset time range, and the following steps are performed cyclically until the currently obtained sub-time slice
  • the length is consistent with the length of the time slice represented by the leaf node of the time slice
  • the second generating module is specifically configured to use the preset spatial range as a root node, and adopt the following The method generates a multi-level spatial index tree: dividing the preset spatial range into a preset number of subspaces of the same length, and performing the following steps cyclically until the currently obtained subspace size is equal to the subspace represented by the leaf nodes of the subspace Up to the size: the currently obtained subspace is used as a child node of the subspace node corresponding to the parent space of the subspace; and the currently obtained subspace is further divided into a preset number of subspaces of the same length; or according to the In the preset space, the distribution of the generated spatiotemporal data in the preset spatial range is divided into subspaces of different sizes, and the following steps are performed cyclically until the currently obtained subspace size matches
  • the size of the subspace represented by the subspace leaf node is as follows: the currently obtained subspace
  • the fourth aspect provides a space-time data query device, including:
  • a node determining module configured to determine a time slice node corresponding to the time query condition in the multi-level time index tree, and a corresponding sub-space node in the multi-level spatial index tree;
  • a leaf node determining module configured to respectively determine all time slice leaf nodes of the time index subtree whose root slice node is determined by the node determining module, and determine the mode by the node
  • the subspace node determined by the block is all subspace leaf nodes of the spatial index subtree of the root node;
  • the query result determining module is configured to determine a query result in the spatio-temporal data stored in the spatio-temporal file mapped by each time slice leaf node and each sub-space leaf node.
  • the query result determining module is specifically configured to separately determine an identifier of each determined leaf slice node and an identifier of a leaf node of each subspace; a preset hash algorithm, the identifier of each time slice leaf node is generated with a hash result of each child space leaf node; and the storage location of the space-time file identified by the hash result is determined according to the generated hash result; And determining the query result in the spatiotemporal data stored in the spatiotemporal file determined according to the storage location.
  • the query result determining module is specifically configured to include, when the query condition, the time query And a condition other than the spatial query condition, and the data stored in the spatiotemporal file conforms to a preset distributed query operation structure, the data amount of the spatiotemporal data stored according to the spatiotemporal file, and the pre a distributed query operation structure is configured to start a corresponding number of parsing processes for the spatiotemporal file; and according to the other query conditions, the corresponding number of parsing processes parse the spatiotemporal data stored in the spatiotemporal file in parallel to obtain a match
  • the analysis result of the other query conditions; the obtained analysis results are summarized and determined as the query result.
  • a fifth aspect provides an index establishing device for spatiotemporal data, comprising: the index establishing device for the spatiotemporal data.
  • a sixth aspect provides a spatiotemporal data query device, including: the spatiotemporal data query device.
  • the preset time range is used as a root node, and a multi-level time index tree is generated according to the preset time rule; and the preset spatial range is used as a root node, according to the preset
  • the spatial rule generates a multi-level spatial index tree; each time slice leaf node and each sub-space leaf node respectively map a spatio-temporal file, wherein the spatio-temporal file is used to store the time of the time slice leaf node mapping relationship with the spatio-temporal file Temporal and spatial data corresponding to the subspaces represented by the slice and subspace leaf nodes.
  • the space-time data query method for the index establishment method of the spatio-temporal data includes: determining a time slice node corresponding to the time query condition in the multi-level time index tree, and corresponding sub-space nodes in the spatial query condition in the multi-level spatial index tree; respectively determining The determined time slice node is all time slice leaf nodes of the time index subtree of the root node, and the determined subspace nodes are all subspace leaf nodes of the spatial index subtree of the root node; each time slice is determined
  • the query result is determined in the spatio-temporal data stored in the spatio-temporal file mapped by the leaf node and each sub-space leaf node.
  • the spatio-temporal data index is established, and when the time-space data is searched, the time query condition and the spatial query condition can be parallelly queried, and the leaf node according to the queried sub-space is obtained. And the time slice leaf node directly indexes the spatiotemporal data that needs to be queried, and improves the query efficiency compared with the prior art querying the spatiotemporal data by the secondary index technology.
  • FIG. 1 is a flowchart of a method for establishing an index of spatiotemporal data according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for querying spatiotemporal data according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for establishing an index of spatiotemporal data according to Embodiment 1 of the present invention
  • 4a-4b are schematic diagrams showing how space-time data is stored in a spatiotemporal file according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for establishing index of spatiotemporal data according to Embodiment 2 of the present invention.
  • FIG. 6 is a schematic diagram of partitioning a preset spatial range and a subsequent obtained subspace according to an embodiment of the present invention
  • FIG. 7 is a flowchart of a method for querying spatiotemporal data according to Embodiment 3 of the present invention.
  • FIG. 8 is a flowchart of a method for querying spatiotemporal data according to Embodiment 4 of the present invention.
  • FIG. 9 is a schematic structural diagram of an apparatus for indexing time and space data according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a space-time data query apparatus according to an embodiment of the present invention.
  • An embodiment of the present invention provides an index establishing method, a query method, and a device for time and space data.
  • the preferred embodiments of the present invention are described in the following description with reference to the accompanying drawings, and the preferred embodiments described herein are intended to illustrate and explain the invention. And in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
  • An embodiment of the present invention provides a method for establishing an index of spatiotemporal data, as shown in FIG. 1 , including the following steps:
  • the multi-level time index tree includes multiple time slice nodes, and the time slice represented by the time slice node closer to the root node is longer.
  • the multi-level spatial index tree includes multiple sub-space nodes, and the sub-space represented by the sub-space node closer to the root node is larger.
  • step S103 for each time slice leaf node, the time slice leaf node is mapped with each subspace leaf node with a spatiotemporal file, or for each subspace leaf node, the subspace leaf node is respectively associated with each time slice.
  • the leaf node maps a spatiotemporal file.
  • the embodiment of the present invention further provides a method for querying spatio-temporal data, as shown in FIG. 2, comprising the following steps:
  • S202 Determine, respectively, that the determined time slice node is all time slice leaf nodes of the time index subtree of the root node, and that the determined subspace node is all subspace leaf nodes of the spatial index subtree of the root node.
  • a method for establishing an index of spatiotemporal data is provided. As shown in FIG. 3, the method includes the following steps:
  • the multi-level time index tree includes multiple time slice nodes, and the time slice represented by the time slice node closer to the root node is longer.
  • the generated multi-level time index tree is composed of a root node, a time slice leaf node, and a time slice node located between the root node and the time slice leaf node, wherein the root node represents a preset time range, and the time slice leaves
  • the time range of the time slice node between the root node and the time slice leaf node is smaller than the preset time range and greater than the time range of the time slice leaf node representation, and the time is smaller.
  • the time slice characterized by the time slice node near the root node is longer.
  • the partitioning may be performed according to a preset time rule, for example, the time range represented by each time slice node is equally divided as a child node of the time slice node.
  • the multi-level spatial index tree includes multiple sub-space nodes, and the sub-spaces represented by the sub-space nodes closer to the root node are larger.
  • the generated multi-level spatial index tree is composed of a root node, a subspace leaf node, and a subspace node located between the root node and the subspace leaf node, wherein the root node represents a preset spatial range, and the subspace leaf node Characterizing the spatial extent that is divided the smallest and cannot be subdivided, the spatial range represented by the subspace node located between the root node and the subspace leaf node is smaller than the preset spatial extent and larger than the spatial extent represented by the subspace leaf node, and closer to the root The spatial extent of the subspace node representation of the node is larger.
  • this step when generating a multi-level spatial index tree, according to a preset space rule
  • the division is performed, for example, the spatial extent represented by each subspace node is equally divided, as a child node of the subspace node, and the like.
  • step S302 and step S301 is not strictly sequential.
  • a node identifier may be set for each leaf node of the multi-level time index tree, and a node identifier is set for each leaf node of the multi-level spatial index tree.
  • a hash function may be designed to obtain the output of the hash function by using the identifier of the leaf node of the time slice and the identifier of the leaf node of the subspace as the input of the hash function.
  • a preset hash algorithm may be used for each time slice leaf node, so that the identifier of the leaf node of the time slice and the identifier of the leaf node of each subspace respectively generate a hash result, or for each subspace leaf node, A preset hash algorithm is used to generate a hash result of the identifier of the leaf node of the subspace and the identifier of the leaf node of each time slice.
  • the output of the hash function may be used as an identifier of a spatiotemporal file for storing spatiotemporal data corresponding to the time slice and the subspace corresponding to the input of the hash function.
  • the spatio-temporal file identified by the id may be used to store the time slice represented by the time slice leaf node identified by the T id and the spatio-temporal data corresponding to the subspace represented by the sub-space leaf node identified by the S id .
  • the embodiment may further include: S306, storing time and space to be stored in the space-time file The steps of the data.
  • step S306 the execution of step S306 and the above steps S301-S305 are not strictly sequential.
  • the spatiotemporal file can be stored in the distributed file system to implement distributed storage of spatiotemporal data.
  • the storage of subspace and time slice related spatiotemporal data is discrete, that is, the related spatiotemporal data of the subspace and the time slice may be discretely stored in the same file, or may be discretely stored in Different files.
  • the storage method when performing the space-time data search, even in the balanced index tree of the secondary index, the subspace and the time slice corresponding to the time query condition and the spatial query are found, and the subspace and the time slice correspond.
  • spatiotemporal data due to the discreteness of spatiotemporal data storage, related data of adjacent subspaces or related data of adjacent time slices are not stored together, which not only slows down the speed of searching for spatiotemporal data, but also stores time and space data. It is not easy to compress and waste storage space.
  • step S306 may specifically include the following steps:
  • Step 1 Determine a storage order of the spatiotemporal data to be stored according to a preset sorting manner of the preset category.
  • Step 2 According to the determined storage order, the spatiotemporal data to be stored in the spatio-temporal file belonging to the same category and having the same data format is encoded and compressed by the first algorithm.
  • Step 3 The space-time data to be stored in the spatio-temporal file belonging to the same category and having the same value is encoded and compressed by the second algorithm, and the stored spatio-temporal data to be stored conforms to the preset distributed query operation structure.
  • first algorithm in step 2 may be a delta algorithm
  • second algorithm in step 3 may be a run-length algorithm.
  • the execution of steps 2 and 3 is not strictly sequential.
  • spatiotemporal data can generally be stored in the form of ⁇ user id, subspace id, time slice id, attribute 1, ... attribute n ⁇ .
  • the storage form contains a plurality of categories, wherein the category user id represents the user who is active in the subspace and time slice characterized by the subspace id and the time slice id.
  • the storage order of the spatiotemporal data to be stored is determined based on a preset sorting manner of the preset category.
  • the user id can be used as a preset category, and the order of the user ids from large to small or from small to large is used as a preset sorting manner of the preset category, that is, the user id can be changed from large to small or small.
  • Spatio-temporal data to be stored is stored in a large order.
  • the spatiotemporal data of the spatio-temporal file belonging to the same category and having the same data format is stored and compressed by the delta algorithm, and the spatio-temporal files belong to the same category and have the same value.
  • the spatio-temporal data to be stored is encoded and compressed by the run-length algorithm.
  • the subspace data located in the subspace id column is taken as an example.
  • the subspace data stored in the same spatiotemporal file is a subspace of the multilevel spatial index tree.
  • the data of the subspace can be stored by the latitude and longitude information of the subspace, and since the subspace data of the subspace represented by the leaf nodes of the same subspace is very close, the compressed storage can also be implemented by delta coding.
  • the floating point type data may be first converted into long integer data, and then delta coded compression is used.
  • the specific method of conversion may be: assuming that the highest precision of the column data is m bits after the decimal point, the decimal point of all the data of the column is shifted to the right by m bits, that is, multiplied by 10 m power, and then the type of the column data is converted. It is a long type.
  • the run-length algorithm may be used for encoding and compression storage. For example, if the attribute 1 category stores application number information, then for the same numbered records stored in adjacent positions in the column: 5 consecutive numbers are 3, which can be implemented by run-length encoding. Compressed, That is, it is stored as 5:3.
  • the entire spatio-temporal file may be further compressed by using a general compression technology such as gzip to achieve a better compression effect and save storage space.
  • a general compression technology such as gzip
  • FIG. 4a-4b are schematic diagrams of spatiotemporal data stored in a spatiotemporal file
  • FIG. 4a is spatiotemporal data organized in a time and space file according to a user id from small to large.
  • the categories included in the spatiotemporal file are: user id, time id, Space id, attribute 1 and attribute 2.
  • FIG. 4b is a schematic diagram of a storage form in a storage block after the spatiotemporal data is compressed and stored according to the spatiotemporal data organization manner provided by the embodiment of the present invention.
  • the storage block 401 stores a pointer p1 for indicating the storage location of the related data of the user whose user id is 105.
  • the pointer p2 is stored, and the pointer p2 is used to indicate that the user id is 203.
  • the storage location of the user's related data in the storage location indicated by the pointer p1, stores the related time information pointer p3, the spatial information pointer p4, the attribute 1 information pointer p5, and the attribute 2 information pointer p6 of the user whose user id is 105,
  • the related time information pointer p7, the spatial information pointer p8, the attribute 1 information pointer p9, and the attribute 2 information pointer p10 of the user having the user id 203 are stored, and the pointers respectively indicate the storage of the corresponding data. position.
  • a method for establishing an index of spatiotemporal data includes the following steps:
  • S501 using the preset time range as a root node, and generating a multi-level time index tree by using the following method:
  • the preset time range is divided into sub-time slices of different lengths and lengths, and the following steps are performed cyclically until the currently obtained sub-time slice length meets the time
  • the time slice length represented by the slice leaf node is: the currently obtained sub-time slice is used as a child node of the time slice node corresponding to the parent time slice of the sub-time slice; and according to the currently obtained sub-time slice, the generated spatio-temporal data is
  • the currently obtained sub-time slice distribution further divides the currently obtained sub-time slice into sub-time slices having different lengths and shorts, wherein in the currently obtained sub-time slice, the time zone in which the spatio-temporal data distribution is denser is divided into sub-times. The more time slices.
  • the generated multi-level time index tree may be stored by using a data structure such as a binary tree or an R-tree.
  • an aliquot can be adopted. For example, if the preset time range is 0-10000, and 0-10000 is the root node of the multi-level time index tree, you can divide the 0-10000 average into two parts, namely 0-5000 and 5000-10000, and 0-5000 and 5000-10000 are two sub-nodes of 0-10000, and are equally divided for 0-5000 and 5000-10000 respectively. Until it is divided into leaf nodes that cannot be subdivided.
  • the time-space data distribution of 0-7 and 22-24 is less. Therefore, when establishing a multi-level time index tree, Divide 0-24 into 0-7, 7-10, 10-13, 13-16, 16-19, 19-22, and divide each time slice as a child node of root node 0-24, and then further The generated sub-time slices are divided, and the child nodes 19-22 are taken as an example. According to the statistical data, the time-space data distribution of the time slice of 21-22 is less, then when the child nodes 19-22 are divided, Divided into 19-19.5, 19.5-20, 20-20.5, 20.5-21, 21-22.
  • the time slice length of the leaf node representation is not less than 0.5, then when a certain level is divided When the time slice length of the node is 0.5, the child node is no longer further divided, and the child node is used as a leaf node.
  • the index of each level of the multi-level time index tree may be divided according to the frequency of the input query condition when the user performs the spatio-temporal data query.
  • S502 Using the preset spatial range as a root node, generate a multi-level spatial index tree by using the following method:
  • the preset space range is divided into a preset number of subspaces of the same length, and the following steps are performed cyclically until the currently obtained subspace size is equal to the subspace size represented by the leaf node of the subspace: the currently obtained subspace is taken as a child node of the child space node corresponding to the parent space of the child space; and further dividing the currently obtained child space into a preset number of child spaces of the same length; or
  • the preset spatial range is divided into sub-spaces of different sizes, and the following steps are performed cyclically until the currently obtained sub-space size meets the sub-space
  • the size of the subspace represented by the spatial leaf node is as follows: the currently obtained subspace is used as the child node of the subspace node corresponding to the parent space of the subspace; and according to the currently obtained subspace, the spatiotemporal data generated in the currently obtained subspace
  • the distribution in the space further divides the currently obtained subspace into subspaces of different sizes, wherein in the currently obtained subspace, the more dense the spatial and temporal data distribution, the more subspaces are divided into spaces.
  • the generated multi-level spatial index tree may be stored by using a data structure such as a binary tree, a quadtree, or an R-tree.
  • FIG. 6a is a schematic diagram of dividing a preset spatial range and a subsequent obtained subspace by using an equal division.
  • the preset spatial range 601 is taken as the root node of the quadtree, and the preset spatial range 601 is equally divided into four subspaces 602 of equal size, as the child nodes of the root node, further
  • Each subspace 602 is equally divided into four subspaces 603 of equal size, and is used as a child node of the corresponding child node, and so on, until it is divided into leaf nodes that cannot be subdivided.
  • FIG. 6b is a schematic diagram of dividing a preset spatial range and a subsequent obtained subspace in an unequal manner.
  • the preset time and space data generated in the preset space range 604 is preset.
  • the distribution within the spatial range 604 since the spatial and temporal data distribution of the left half of the preset spatial range 604 is denser than that of the right half, the left half of the preset spatial range 604 is divided into more sub-spaces than the right
  • the half that is, divided into three subspaces of different sizes: 605, 606, and 607, and serves as child nodes of the preset space range 604. According to the same division principle, the three sub-nodes are further divided until they are divided into non-re-divided leaf nodes.
  • the sub-space size of the leaf node representation is not less than 3 square meters, then a certain level of division When the size of the subspace represented by the node is 3 square meters, the child node is no longer further divided, and the child node is used as a leaf node.
  • the index of each level of the multi-level spatial index tree may be divided according to the frequency of the input query condition when the user performs the spatio-temporal data query.
  • mapping a time-space file for each time slice leaf node and each sub-space leaf node wherein the space-time file is used to store a time slice and a sub-space leaf node representation characterized by a time slice leaf node having a mapping relationship with the space-time file.
  • Space-time data corresponding to the subspace.
  • a method for querying spatio-temporal data is provided based on the embodiment of the present invention.
  • the method for querying spatio-temporal data may be based on the method for establishing an index of spatio-temporal data provided in the foregoing embodiment, as shown in FIG. The following steps:
  • the time query condition input by the user may correspond to any node in the multi-level time index tree.
  • the user input time query condition may correspond to the entire preset time range, or may be combined with one or more The time slice leaves correspond to each other.
  • the spatial query condition input by the user may correspond to any node in the multi-level spatial index tree.
  • the user input spatial query condition may correspond to the entire preset spatial range, or may correspond to one or more subspace leaf nodes. .
  • the search can be performed in parallel in the multi-level time index tree and the multi-level spatial index tree according to the time query condition and the spatial query condition, thereby saving the search time.
  • S702. Determine, respectively, that the determined time slice node is all time slice leaf nodes of the time index subtree of the root node, and that the determined subspace node is all subspace leaf nodes of the spatial index subtree of the root node.
  • the time slice leaf node when the time query condition corresponds to a certain time slice leaf node in the multi-level time index tree, the time slice leaf node may be determined as the time slice leaf node to be determined in this step; similarly, when the space query condition is met When corresponding to a sub-space leaf node of the multi-level spatial index tree, the sub-space leaf node may be determined as a sub-space leaf node to be determined in this step;
  • the time query condition corresponds to a time slice non-leaf node in the multi-level time index tree
  • all the time slice leaf nodes of the time index subtree with the time slice non-leaf node as the root node are determined; similarly, when the space query
  • all sub-space leaf nodes of the spatial index sub-tree with the sub-space non-leaf node as the root node are determined.
  • a preset hash algorithm may be used for each time slice leaf node, so that the identifier of the leaf node of the time slice and the identifier of the leaf node of each subspace respectively generate a hash result, or for each subspace leaf node, A preset hash algorithm is used to generate a hash result of the identifier of the leaf node of the subspace and the identifier of the leaf node of each time slice.
  • the preset hash algorithm corresponds to the preset hash algorithm used when establishing the spatiotemporal data index.
  • S705. Determine, according to the generated hash result, a storage location of the spatiotemporal file identified by the hash result.
  • the search by the multi-level spatial index tree is directly located to meet the spatial query condition.
  • the subspace through the multi-level time index tree search, directly locates the time slice that meets the spatio-temporal query condition, and then finds the spatio-temporal file related to the subspace and the time slice according to the subspace and the time slice, and the prior art
  • the two-dimensional space to be divided is converted into one-dimensional coded data.
  • searching the one-dimensional coded data that is found is converted into two-dimensional space, and redundant time-space data is not generated, and the space-time data is analyzed. Save time and increase search speed.
  • a method for querying the space-time data based on the method for establishing an index of spatio-temporal data according to the embodiment of the present invention is provided. As shown in FIG. 8, the method includes the following steps:
  • S802. Determine, respectively, that the determined time slice node is all time slice leaf nodes of the time index subtree of the root node, and that the determined subspace node is all subspace leaf nodes of the spatial index subtree of the root node.
  • the query condition further includes other query conditions other than the time query condition and the spatial query condition
  • the data stored in the space-time file conforms to the preset distributed query operation structure, according to the space-time storage of the space-time file.
  • the data amount of the data, and the above-mentioned preset distributed query operation structure initiates a corresponding number of parsing processes for the spatiotemporal file.
  • the spatiotemporal data can be stored in the form of ⁇ user id, subspace id, time slice id, attribute 1, ... attribute n ⁇ .
  • querying the spatiotemporal data of the storage form it can be time and space for a certain user.
  • Constraint queries can also be aggregated queries for spatiotemporal constraints on multiple attributes.
  • all the spatiotemporal data stored in the determined spatio-temporal file may be determined as the query result; and the query condition further includes the above-mentioned time query condition and the above spatial query condition.
  • the determined space-time file needs to be further parsed to obtain data that meets the other query conditions.
  • the spatiotemporal file when the spatiotemporal data is stored, the spatiotemporal file is stored in the distributed file system, and distributed storage of spatiotemporal data is realized, so that the data stored in the spatiotemporal file conforms to the preset distributed query operation structure. Therefore, when parsing the spatio-temporal file, the corresponding amount of parsing process can be started for the spatio-temporal file according to the data amount of the spatio-temporal data stored in the spatio-temporal file and the preset distributed query operation structure.
  • the corresponding number of parsing processes parse the spatiotemporal data stored in the spatiotemporal file in parallel, and obtain an analysis result that meets the other query conditions.
  • the spatiotemporal data can be split into multiple parts according to the data volume of the spatiotemporal data stored in the spatiotemporal file, and a parsing process is started for each part, and the spatiotemporal data is parsed in parallel.
  • This distributed storage method enables the use of corresponding distributed queries to improve the query speed when querying spatio-temporal data.
  • a space-time file is stored in a file system (HDFS, Hadoop Distributed File System)
  • HDFS Hadoop Distributed File System
  • MR MapReduce
  • S808 The analysis results obtained in S807 are summarized and determined as a query result.
  • the parsing results parsed by multiple parsing processes in the distributed file system are summarized, and the aggregated parsing result is determined as the final query result and fed back to the user.
  • an embodiment of the present invention further provides a device and a device,
  • the principle of the problem solved by the device and the device is similar to the foregoing method for indexing the spatio-temporal data or a method for querying the spatio-temporal data. Therefore, the implementation of the device and the device can be referred to the implementation of the foregoing method, and the repeated description is omitted.
  • An index establishing device for spatiotemporal data provided by an embodiment of the present invention, as shown in FIG. 9, includes the following modules:
  • a first generation module 901 configured to generate a multi-level time index tree as a root node, where the multi-level time index tree includes multiple time slice nodes, and the time slice node is closer to the root node The longer the time slice;
  • the second generation module 902 is configured to generate a multi-level spatial index tree by using the preset spatial range as a root node, where the multi-level spatial index tree includes multiple sub-space nodes, and the closer to the sub-space node of the root node, the sub-space node The larger the space;
  • the mapping module 903 is configured to respectively use each time slice leaf node of the multi-level time index tree generated by the first generation module 901 and each sub-space leaf node of the multi-level spatial index tree generated by the second generation module 902 Mapping a spatiotemporal file; wherein the spatiotemporal file is used to store a time slice represented by a time slice leaf node and a subspace corresponding to a subspace represented by a subspace leaf node in a mapping relationship with the spatiotemporal file.
  • the mapping module 903 is specifically configured to respectively determine an identifier of a leaf node of each time slice and an identifier of a leaf node of each subspace; and adopt a preset hash algorithm to make the identifier of each leaf node of each time slice.
  • the identifier of the subspace leaf node generates a hash result; and the obtained hash result is determined as an identifier of the spatiotemporal file, wherein the spatiotemporal file is a time slice represented by the time slice leaf node corresponding to the hash result, and Space-time file of subspace mapping represented by subspace leaf nodes.
  • the device further includes: a storage module 904;
  • the storage module 904 is configured to store spatiotemporal data to be stored in the spatiotemporal file.
  • the storage module 904 is specifically configured to determine a storage order of the spatio-temporal data to be stored according to a preset sorting manner of a preset category; and the spatio-temporal files belong to the same category according to the determined storage order. And the spatiotemporal data to be stored in the same data format, using the first algorithm The coded compressed storage is performed; and the spatiotemporal data to be stored in the same category and having the same value in the spatiotemporal file is encoded and compressed by the second algorithm, and the stored spatiotemporal data to be stored conforms to the preset distributed Query the operation structure.
  • the first generation module 901 is specifically configured to use the preset time range as a root node, and generate a multi-level time index tree by dividing the preset time range into a preset number of sub-times of the same length. Slice, and cyclically perform the following steps until the currently obtained sub-time slice length is equal to the time slice length represented by the time slice leaf node: the currently obtained sub-time slice is used as the time slice corresponding to the parent time slice of the sub-time slice a child node of the node; and further dividing the currently obtained sub-time slice into a preset number of sub-time slices of the same length; or
  • the preset time range into sub-time slices having different lengths and lengths according to the distribution of the spatio-temporal data generated in the preset time range, and performing the following steps cyclically until the currently obtained
  • the sub-time slice length is consistent with the time slice length represented by the time slice leaf node: the currently obtained sub-time slice is used as a child node of the time slice node corresponding to the parent time slice of the sub-time slice; and according to the currently obtained a sub-time slice, the distribution of the generated spatio-temporal data in the currently obtained sub-time slice, further dividing the currently obtained sub-time slice into sub-time slices of different lengths, wherein the currently obtained sub-slice Within the time slice, the more dense the time-space data distribution, the more sub-time slices are divided into time segments.
  • the second generation module 902 is specifically configured to use the preset spatial range as a root node, and generate a multi-level spatial index tree by dividing the preset spatial range into a preset number of subspaces having the same length. And looping through the following steps until the currently obtained subspace size is equal to the subspace size represented by the subspace leaf node: the currently obtained subspace is used as a child node of the subspace node corresponding to the parent space of the subspace; And further dividing the currently obtained subspace into a preset number of subspaces of the same length; or
  • the preset spatial range into subspaces of different sizes according to the distribution of the generated spatiotemporal data in the preset spatial range, and performing the following steps cyclically until the currently obtained sub-space
  • the size of the space is consistent with the size of the subspace represented by the leaf node of the subspace: the currently obtained subspace is used as a child node of the subspace node corresponding to the parent space of the subspace; And further dividing the currently obtained subspace into subspaces of different sizes according to the distribution of the spatiotemporal data generated in the currently obtained subspace in the currently obtained subspace, wherein, in the current Within the obtained subspace, the more dense the spatial and temporal data distribution, the more subspaces are divided into spaces.
  • a space-time data query device provided by an embodiment of the present invention, as shown in FIG. 10, includes the following modules:
  • the node determining module 1001 is configured to determine a time slice node corresponding to the time query condition in the multi-level time index tree, and a corresponding sub-space node in the multi-level spatial index tree;
  • the leaf node determining module 1002 is configured to respectively determine all time slice leaf nodes of the time index subtree whose root slice node is determined by the node determining module 1001, and the subspace node determined by the node determining module 1001 Indexing all subspace leaf nodes of the subtree for the space of the root node;
  • the query result determining module 1003 is configured to determine a query result in the spatio-temporal data stored in the spatio-temporal file mapped by each time slice leaf node and each sub-space leaf node.
  • the query result determining module 1003 is specifically configured to separately determine the identifier of each determined leaf slice node and the identifier of each subspace leaf node; and use a preset hash algorithm to make each time slice
  • the identifier of the leaf node and the identifier of the leaf node of each subspace generate a hash result; determining, according to the generated hash result, a storage location of the spatiotemporal file identified by the hash result; and determining a spatiotemporal file according to the storage location In the stored spatiotemporal data, the query result is determined.
  • the query result determining module 1003 is specifically configured to: when the query condition further includes other query conditions other than the time query condition and the spatial query condition, and the data stored in the space-time file conforms to the pre-
  • the distributed query operation structure is set, according to the data amount of the spatiotemporal data stored in the spatiotemporal file, and the preset distributed query operation structure, a corresponding number of parsing processes are started for the spatiotemporal file; according to the other query conditions And causing the corresponding number of parsing processes to parse the spatiotemporal data stored in the spatiotemporal file in parallel to obtain an parsing result that meets the other query conditions; and the obtained parsing result is summarized and determined as a query result.
  • An index establishing device for spatiotemporal data provided by the embodiment of the present invention includes: the foregoing index establishing device for spatiotemporal data.
  • a space-time data query device provided by the embodiment of the invention includes the above-mentioned spatio-temporal data query device.
  • the functions of the above units may correspond to the corresponding processing steps in the processes shown in FIG. 1 to FIG. 3, FIG. 5, and FIG. 7 to FIG. 8, and details are not described herein again.
  • a preset time range is used as a root node to generate a multi-level time index tree; and a preset spatial range is used as a root node to generate a multi-level spatial index tree;
  • the time slice leaf node maps a spatiotemporal file with each subspace leaf node, wherein the spatiotemporal file is used to store the time slice represented by the time slice leaf node and the subspace corresponding to the subspace represented by the leaf space. Time and space data.
  • the spatio-temporal data query method based on the index establishment method of the foregoing spatio-temporal data provided by the embodiment of the present invention includes: determining a time slice node corresponding to a time query condition in a multi-level time index tree, and a spatial query condition is Corresponding subspace nodes in the hierarchical spatial index tree; respectively determining that the determined time slice node is all time slice leaf nodes of the time index subtree of the root node, and the determined subspace node is the spatial index subtree of the root node All subspace leaf nodes; determining the query result according to the spatiotemporal data stored in the spatiotemporal file mapped by each time slice leaf node and each subspace leaf node.
  • the spatio-temporal data index is established, and when the time-space data is searched, the time query condition and the spatial query condition can be parallelly queried, and the leaf node according to the queried sub-space is obtained. And the time slice leaf node directly indexes the spatiotemporal data that needs to be queried, and improves the query efficiency compared with the prior art querying the spatiotemporal data by the secondary index technology.
  • the embodiments of the present invention may be implemented by hardware, or may be implemented by means of software plus a necessary general hardware platform.
  • the technical solution of the embodiment of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.).
  • a non-volatile storage medium which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • Including several instructions to make a computer device can be a A human computer, server, or network device, etc. performs the methods described in various embodiments of the present invention.
  • modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment according to the description of the embodiments, or the corresponding changes may be located in one or more apparatuses different from the embodiment.
  • the modules of the above embodiments may be combined into one module, or may be further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供了一种时空数据的索引建立方法、查询方法、装置及设备,将预设时间范围作为根节点生成多级时间索引树;并将预设空间范围作为根节点生成多级空间索引树;分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,其中,时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。基于本发明实施例提供的时空数据的索引建立方法,建立的时空数据索引,在查找时空数据时,提高了查询效率。本发明涉及计算机技术领域。

Description

一种时空数据的索引建立方法、查询方法、装置及设备 技术领域
本发明涉及计算机技术领域,尤其涉及一种时空数据的索引建立方法、查询方法、装置及设备。
背景技术
随着移动网络的爆炸式增长和移动智能设备的广泛应用,移动用户的时空数据(又称为用户时空分布数据)成为一种重要的大数据来源。例如,用户的时空数据可以为由GPS服务记录的该用户在不同的时空移动时,产生的有关移动轨迹的时间和空间的数据,又例如,用户使用移动网络时,基站记录的移动宽带(MBB,Mobile Broadband)数据中也含有大量用户的时空数据。
一个中等规模的城市中用户数量通常在200万以上,道路网络上的关键地点数量通常在10万以上,数据采样的时间间隔一般为数秒至数分钟,因此,产生的时空数据的数量级会达到PB甚至EB。那么,在对这些时空数据进行查找时,如何合理地建立索引,使得能够快速地查找到所需的时空数据成为亟待解决的问题。
现有技术中,针对时空数据的索引建立,主要包括如下步骤:
步骤1:将待划分空间范围划分成多个子空间。
步骤2:将步骤1中二维的待划分空间范围转化成一维的编码数据。
本步骤中,将步骤1中划分得到的多个子空间按某种顺序(例如:z-curve)进行编码。编码策略为尽量使位置上相邻的子空间的编码相邻。但是,尽管按照这样的策略进行编码,仍不能保证相邻的子空间编码一定相邻。有些子空间编码虽然相邻,但是实际的位置却相差甚远。
步骤3:将步骤2中得到的一维的编码数据按照传统的平衡树索引等方法建立空间平衡索引树,其中,每个叶子节点记录了该叶子节点编码对应的子空间相关的时空数据的存储位置。
步骤4:将待划分时间范围按照传统的平衡树索引等方法建立时间平衡索引树,其中,每个叶子节点记录了该叶子节点对应的时间相关的时空数据的存储位置。
进一步地,基于现有技术中的上述索引建立方法,建立的时空轨迹数据的索引,在查找时空数据时,主要的步骤包括如下步骤:
步骤1:根据用户输入的空间查询条件,确定出满足空间查询条件的各子空间对应的所有编码。从空间平衡索引树中查找确定出的编码对应的节点,从而得到满足空间查询空间的时空数据的存储位置。
本步骤中,由于对待划分空间范围划分的子空间进行编码时,无法保证相邻子空间的编码一定相邻,因此,本步骤中查找到的时空数据可能存在冗余数据。
步骤2:根据用户输入的时间查询条件,从时间平衡索引树中查找对应的节点,从而得到满足时间查询空间的时空数据的存储位置。
步骤3:从步骤1得到的时空数据和步骤2得到的时空数据中,确定具有交集的时空数据,并作为查询结果。
可见现有技术中建立时空数据索引的方法,使得在查找时空数据时需要先计算空间编码,并根据时间和空间分别进行对应时空数据的查找,再将查找到的时空数据汇总,得到查找结果。这样经过二次索引,降低了查找效率。
发明内容
本发明实施例提供了一种时空数据的索引建立方法、查询方法、装置及设备,用以解决基于现有技术中建立的时空数据索引查找时空数据时效率低的问题。
第一方面,提供一种时空数据的索引建立方法,包括:
将预设时间范围作为根节点生成多级时间索引树;其中,所述多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长;并
将预设空间范围作为根节点生成多级空间索引树;其中,所述多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大;
分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件;其中,所述时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
结合第一方面,在第一种可能的实现方式中,分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,具体包括:分别确定每个时间片叶子节点的标识,以及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;并将得到的散列结果确定为时空文件的标识,其中,所述时空文件为该散列结果对应的时间片叶子节点所表征的时间片,以及子空间叶子节点所表征的子空间映射的时空文件。
结合第一方面,或者结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,还包括:向所述时空文件中存储待存储的时空数据。
结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,向所述时空文件中存储待存储的时空数据,具体包括:基于预设类目的预设排序方式,确定所述待存储的时空数据的存储顺序;按照确定的存储顺序,将该时空文件中属于同一类目且具有相同数据格式的待存储的时空数据,采用第一算法进行编码压缩存储;并将该时空文件中属于同一类目且具有相同数值的待存储的时空数据,采用第二算法进行编码压缩存储,并使存储后的待存储的时空数据符合预设分布式查询运算结构。
结合第一方面,或者结合第一方面的第一种可能的实现方式,在第四种可能的实现方式中,将预设时间范围作为根节点,采用如下方法生成多级时间索引树:将所述预设时间范围划分为长度相同的预设数量个子时间片,并循环执行如下步骤,直到当前得到的子时间片长度等于所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并将所述当前得到的子时间片进一步划分为长 度相同的预设数量个子时间片;或者根据所述预设时间范围内,产生的时空数据在所述预设时间范围内的分布,将所述预设时间范围划分为长短不同的子时间片,并循环执行如下步骤,直到当前得到的子时间片长度符合所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并根据所述当前得到的子时间片内,产生的时空数据在所述当前得到的子时间片内的分布,将所述当前得到的子时间片进一步划分为长短不同的子时间片,其中,在所述当前得到的子时间片内,时空数据分布越密集的时间段划分成的子时间片越多。
结合第一方面,或者结合第一方面的第一种可能的实现方式,在第五种可能的实现方式中,将预设空间范围作为根节点,采用如下方法生成多级空间索引树:将所述预设空间范围划分为长度相同的预设数量个子空间,并循环执行如下步骤,直到当前得到的子空间大小等于所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并将所述当前得到的子空间进一步划分为长度相同的预设数量个子空间;或者根据所述预设空间范围内,产生的时空数据在所述预设空间范围内的分布,将所述预设空间范围划分为大小不同的子空间,并循环执行如下步骤,直到当前得到的子空间大小符合所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并根据所述当前得到的子空间内,产生的时空数据在所述当前得到的子空间内的分布,将所述当前得到的子空间进一步划分为大小不同的子空间,其中,在所述当前得到的子空间内,时空数据分布越密集的空间划分成的子空间越多。
第二方面,提供一种时空数据查询方法,包括:
确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;
分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶 子节点;
在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。
结合第二方面,在第一种可能的实现方式中,在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件中存储的时空数据中,确定查询结果,具体包括:分别确定所述确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;根据生成的散列结果,确定所述散列结果标识的时空文件的存储位置;并在根据所述存储位置确定的时空文件存储的时空数据中,确定查询结果。
结合第二方面,或者结合第二方面的第一种可能的实现方式,在第二种可能的实现方式中,当查询条件中还包括除所述时间查询条件和所述空间查询条件之外的其他查询条件,且所述时空文件中存储的数据符合预设分布式查询运算结构时,在所述时空文件存储的时空数据中,确定查询结果,具体包括:根据所述时空文件存储的时空数据的数据量,以及所述预设分布式查询运算结构,为所述时空文件启动对应数量的解析进程;根据所述其他查询条件,使所述对应数量的解析进程并行对所述时空文件中存储的时空数据进行解析,得到符合所述其他查询条件的解析结果;将得到的所述解析结果汇总,并确定为查询结果。
第三方面,提供一种时空数据的索引建立装置,包括:
第一生成模块,用于将预设时间范围作为根节点生成多级时间索引树;其中,所述多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长;
第二生成模块,用于将预设空间范围作为根节点生成多级空间索引树;其中,所述多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大;
映射模块,用于分别将所述第一生成模块生成的多级时间索引树的每个 时间片叶子节点与所述第二生成模块生成的多级空间索引树的每个子空间叶子节点映射一个时空文件;其中,所述时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
结合第三方面,在第一种可能的实现方式中,所述映射模块,具体用于分别确定每个时间片叶子节点的标识,以及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;并将得到的散列结果确定为时空文件的标识,其中,所述时空文件为该散列结果对应的时间片叶子节点所表征的时间片,以及子空间叶子节点所表征的子空间映射的时空文件。
结合第三方面,或者结合第三方面的第一种可能的实现方式,在第二种可能的实现方式中,所述装置还包括:存储模块;所述存储模块,用于向所述时空文件中存储待存储的时空数据。
结合第三方面的第二种可能的实现方式,在第三种可能的实现方式中,所述存储模块,具体用于基于预设类目的预设排序方式,确定所述待存储的时空数据的存储顺序;按照确定的存储顺序,将该时空文件中属于同一类目且具有相同数据格式的待存储的时空数据,采用第一算法进行编码压缩存储;并将该时空文件中属于同一类目且具有相同数值的待存储的时空数据,采用第二算法进行编码压缩存储,并使存储后的待存储的时空数据符合预设分布式查询运算结构。
结合第三方面,或者结合第三方面的第一种可能的实现方式,在第四种可能的实现方式中,所述第一生成模块,具体用于将预设时间范围作为根节点,采用如下方法生成多级时间索引树:将所述预设时间范围划分为长度相同的预设数量个子时间片,并循环执行如下步骤,直到当前得到的子时间片长度等于所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并将所述当前得到的子时间片进一步划分为长度相同的预设数量个子时间片;或者根据所述 预设时间范围内,产生的时空数据在所述预设时间范围内的分布,将所述预设时间范围划分为长短不同的子时间片,并循环执行如下步骤,直到当前得到的子时间片长度符合所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并根据所述当前得到的子时间片内,产生的时空数据在所述当前得到的子时间片内的分布,将所述当前得到的子时间片进一步划分为长短不同的子时间片,其中,在所述当前得到的子时间片内,时空数据分布越密集的时间段划分成的子时间片越多。
结合第三方面,或者结合第三方面的第一种可能的实现方式,在第五种可能的实现方式中,所述第二生成模块,具体用于将预设空间范围作为根节点,采用如下方法生成多级空间索引树:将所述预设空间范围划分为长度相同的预设数量个子空间,并循环执行如下步骤,直到当前得到的子空间大小等于所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并将所述当前得到的子空间进一步划分为长度相同的预设数量个子空间;或者根据所述预设空间范围内,产生的时空数据在所述预设空间范围内的分布,将所述预设空间范围划分为大小不同的子空间,并循环执行如下步骤,直到当前得到的子空间大小符合所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并根据所述当前得到的子空间内,产生的时空数据在所述当前得到的子空间内的分布,将所述当前得到的子空间进一步划分为大小不同的子空间,其中,在所述当前得到的子空间内,时空数据分布越密集的空间划分成的子空间越多。
第四方面,提供一种时空数据查询装置,包括:
节点确定模块,用于确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;
叶子节点确定模块,用于分别确定以所述节点确定模块确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以所述节点确定模 块确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点;
查询结果确定模块,用于在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。
结合第四方面,在第一种可能的实现方式中,所述查询结果确定模块,具体用于分别确定所述确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;根据生成的散列结果,确定所述散列结果标识的时空文件的存储位置;并在根据所述存储位置确定的时空文件存储的时空数据中,确定查询结果。
结合第四方面,或者结合第四方面的第一种可能的实现方式,在第二种可能的实现方式中,所述查询结果确定模块,具体用于当查询条件中还包括除所述时间查询条件和所述空间查询条件之外的其他查询条件,且所述时空文件中存储的数据符合预设分布式查询运算结构时,根据所述时空文件存储的时空数据的数据量,以及所述预设分布式查询运算结构,为所述时空文件启动对应数量的解析进程;根据所述其他查询条件,使所述对应数量的解析进程并行对所述时空文件中存储的时空数据进行解析,得到符合所述其他查询条件的解析结果;将得到的所述解析结果汇总,并确定为查询结果。
第五方面,提供一种时空数据的索引建立设备,包括:上述时空数据的索引建立装置。
第六方面,提供一种时空数据查询设备,包括:上述时空数据查询装置。
本发明实施例的有益效果包括:
本发明实施例提供的一种时空数据的索引建立方法中,将预设时间范围作为根节点,根据预设时间规则生成多级时间索引树;并将预设空间范围作为根节点,根据预设空间规则生成多级空间索引树;分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,其中,时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。相应地,本发明实施例提供的基于上述一种 时空数据的索引建立方法的时空数据查询方法,包括:确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点;在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。可见,基于本发明实施例提供的时空数据的索引建立方法,建立的时空数据索引,在查找时空数据时,可以对时间查询条件和空间查询条件进行并行查询,并根据查询到的子空间叶子节点和时间片叶子节点,对需要查询的时空数据直接索引,与现有技术中通过二次索引技术查询时空数据相比,提高了查询效率。
附图说明
图1为本发明实施例提供的一种时空数据的索引建立方法的流程图;
图2为本发明实施例提供的一种时空数据查询方法的流程图;
图3为本发明实施例1提供的一种时空数据的索引建立方法的流程图;
图4a-图4b为本发明实施例提供的时空数据在时空文件中存储方式的示意图;
图5为本发明实施例2提供的一种时空数据的索引建立方法的流程图;
图6a-图6b为本发明实施例提供的对预设空间范围以及后续得到的子空间进行划分示意图;
图7为本发明实施例3提供的一种时空数据查询方法的流程图;
图8为本发明实施例4提供的一种时空数据查询方法的流程图;
图9为本发明实施例提供的一种时空数据的索引建立装置的结构示意图;
图10为本发明实施例提供的一种时空数据查询装置的结构示意图。
具体实施方式
本发明实施例提供了一种时空数据的索引建立方法、查询方法、装置及 设备,以下结合说明书附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本发明实施例提供一种时空数据的索引建立方法,如图1所示,包括如下步骤:
S101、将预设时间范围作为根节点生成多级时间索引树;其中,多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长。
S102、将预设空间范围作为根节点生成多级空间索引树;其中,多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大。
S103、分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,其中,时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
进一步地,上述步骤S101和S102没有严格的执行顺序。步骤S103中,针对每个时间片叶子节点,将该时间片叶子节点分别与每个子空间叶子节点映射一个时空文件,或者针对每个子空间叶子节点,将该子空间叶子节点分别与每个时间片叶子节点映射一个时空文件。
相应的,基于上述时空数据的索引建立方法,本发明实施例还提供一种时空数据查询方法,如图2所示,包括如下步骤:
S201、确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点。
S202、分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点。
S203、在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。
下面结合附图,用具体实施例对本发明提供的方法及相关设备进行详细描述。
实施例1:
本发明实施例1中,提供了一种时空数据的索引建立方法,如图3所示,具体包括如下步骤:
S301、将预设时间范围作为根节点生成多级时间索引树;其中,多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长。
本步骤中,生成的多级时间索引树由根节点、时间片叶子节点、以及位于根节点和时间片叶子节点之间的时间片节点构成,其中,根节点表征预设时间范围,时间片叶子节点表征划分得最小的、无法再划分的时间范围,位于根节点和时间片叶子节点之间的时间片节点表征的时间范围小于预设时间范围且大于时间片叶子节点表征的时间范围、且越靠近根节点的时间片节点表征的时间范围越长。
进一步地,本步骤中在生成多级时间索引树时,可以根据预设时间规则进行划分,例如:将每个时间片节点表征的时间范围平均分,作为该时间片节点的子节点等。
S302、将预设空间范围作为根节点生成多级空间索引树;其中,多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大。
本步骤中,生成的多级空间索引树由根节点、子空间叶子节点、以及位于根节点和子空间叶子节点之间的子空间节点构成,其中,根节点表征预设空间范围,子空间叶子节点表征划分得最小的、无法再划分的空间范围,位于根节点和子空间叶子节点之间的子空间节点表征的空间范围小于预设空间范围且大于子空间叶子节点表征的空间范围、且越靠近根节点的子空间节点表征的空间范围越大。
进一步地,本步骤中在生成多级空间索引树时,可以根据预设空间规则 进行划分,例如:将每个子空间节点表征的空间范围平均分,作为该子空间节点的子节点等。
进一步地,步骤S302与步骤S301的执行没有严格的先后顺序。
S303、分别确定每个时间片叶子节点的标识,以及每个子空间叶子节点的标识。
进一步地,本步骤中,可以为多级时间索引树的每个叶子节点设置节点标识,以及为多级空间索引树的每个叶子节点设置节点标识。
S304、采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果。
进一步地,本步骤中,可以设计某种散列函数,将时间片叶子节点的标识及子空间叶子节点的标识作为该散列函数的输入,得到散列函数的输出。
本步骤中可以针对每个时间片叶子节点,采用预设散列算法,使该时间片叶子节点的标识分别与每个子空间叶子节点的标识生成一个散列结果,或者针对每个子空间叶子节点,采用预设散列算法,使该子空间叶子节点的标识分别与每个时间片叶子节点的标识生成一个散列结果。
S305、将得到的散列结果确定为时空文件的标识,其中,所述时空文件为该散列结果对应的时间片叶子节点所述表征的时间片,以及子空间叶子节点所述表征的子空间映射的时空文件。
进一步地,本步骤中,可以将散列函数的输出作为时空文件的标识,该时空文件用于存储与散列函数的输入对应的时间片和子空间对应的时空数据。下面举例说明:
例如,散列函数为:id=Tid×C+hash(Sid)%k,其中,id表示时空文件的标识,Tid表示时间片叶子节点的标识,Sid表示子空间叶子节点的标识,C和k为常数。id所标识的时空文件,可以用于存储Tid所标识的时间片叶子节点表征的时间片,以及Sid所标识的子空间叶子节点表征的子空间对应的时空数据。
进一步地,本实施例还可以包括S306、向时空文件中存储待存储的时空 数据的步骤。
本步骤中,步骤S306与上述步骤S301-S305的执行没有严格的先后顺序。
本步骤中,可以将时空文件存储在分布式文件系统中,以实现时空数据的分布式存储。现有技术中,对于子空间和时间片相关时空数据的存储是离散的,也就是说,可以将子空间和时间片的相关时空数据离散地存储在同一个文件中,也可以离散地存储在不同的文件中。针对这种存储方式,在进行时空数据查找时,即使在二次索引的平衡索引树中,查找到与时间查询条件和空间查询对应的子空间和时间片,以及该子空间和时间片对应的时空数据的存储位置,由于时空数据存储的离散性,邻近子空间的相关数据或者邻近时间片的相关数据并没有存储在一起,不但减慢了查找时空数据的速度,并且在对时空数据进行存储时不易压缩,浪费存储空间。
针对上述问题,步骤S306可以具体包括如下步骤:
步骤1、基于预设类目的预设排序方式,确定所述待存储的时空数据的存储顺序。
步骤2、按照确定的存储顺序,将该时空文件中属于同一类目且具有相同数据格式的待存储的时空数据,采用第一算法进行编码压缩存储。
步骤3、将该时空文件中属于同一类目且具有相同数值的待存储时空数据,采用第二算法进行编码压缩存储,并使存储后的待存储的时空数据符合预设分布式查询运算结构。
进一步地,步骤2中的第一算法可以为delta算法,步骤3中的第二算法可以为run-length算法。步骤2与步骤3的执行没有严格的先后顺序。
进一步地,时空数据通常可以用{用户id,子空间id、时间片id,属性1,…属性n}的形式存储。该存储形式中包含多个类目,其中,类目用户id表征在子空间id和时间片id表征的子空间和时间片内活动的用户。采用本发明实施例提供的时空数据存储方式进行存储时,可以依照如下规则:
首先,基于预设类目的预设排序方式,确定待存储的时空数据的存储顺序。
例如,可以将用户id作为预设类目,将用户id从大到小或者从小到大的顺序作为预设类目的预设排序方式,也就是说,可以按照用户id从大到小或者从小到大的顺序存储待存储的时空数据。
其次,按照确定的存储顺序,将时空文件中属于同一类目且具有相同数据格式的待存储的时空数据,采用delta算法进行编码压缩存储,将该时空文件中属于同一类目且具有相同数值的待存储的时空数据,采用run-length算法进行编码压缩存储。
也就是说,可以使用列存储技术存储时空数据,将相同列的数据(即相同类目的数据)存储在一起。因为位于相同的数据列中的数据具有相同的数据格式,按列进行存储存在很大的压缩空间。例如,原始的Unix时间戳需要按照64位的长整形数据进行存储,由于存储在一个文件中的数据其存储时间是邻近的,存储了一个时间点之后,后面的记录只需要存储与自身邻近的上一个时间点记录的4位差值即可,这样就实现了delta编码压缩。
对时空数据的存储也是同样的道理,以位于子空间id列的子空间数据为例,存储在同一个时空文件中的子空间数据为多级空间索引树的一个子空间叶子节点表征的子空间的相关数据。可以通过该子空间的经纬度信息对该子空间的数据进行存储,又由于同一个子空间叶子节点表征的子空间的子空间数据都很接近,因此也可以通过delta编码实现压缩存储。
进一步地,由于经纬度信息为浮点类型数据,而delta编码对于经纬度等浮点类型的数据并没有压缩效果,因此,可以首先将浮点类型数据转换为长整型数据,再利用delta编码压缩。转换的具体方法可以为:假设该列数据的最高精度为小数点后m位,则将该列所有数据的小数点右移m位,即乘以10的m次方,再将该列数据的类型转换为长整型。
进一步地,针对时空文件中属于同一类目且具有相同数值的待存储的时空数据,可以采用run-length算法进行编码压缩存储。例如,假设属性1类目存储的是应用程序编号信息,那么,对于存储在该列中相邻位置的相同编号的记录:连续5个编号均为3,则可以通过run-length编码的方式实现压缩, 即存储为5:3。
进一步地,经过上述基于列的压缩存储对时空数据存储之后,还可以对整个时空文件采用通用压缩技术如gzip进一步进行压缩,以达到更好的压缩效果,节约存储空间。而现有技术中通过二次索引的方式查找时空数据,很难支持分布式运算。
下面举例说明:
图4a-图4b为时空数据在时空文件中存储的示意图,图4a为时空文件中按照用户id从小到大顺序组织的时空数据,该时空文件中包括的类目有:用户id、时间id、空间id、属性1和属性2。图4b为按照本发明实施例提供的时空数据组织方式对时空数据进行压缩存储后,在存储块中的存储形式示意图。其中,存储块401中,存储了指针p1,指针p1用于指示用户id为105的用户的相关数据的存储位置,存储块402中,存储了指针p2,指针p2用于指示用户id为203的用户的相关数据的存储位置,在指针p1指示的存储位置,又存储了用户id为105的用户的相关时间信息指针p3、空间信息指针p4、属性1信息指针p5,以及属性2信息指针p6,在指针p2指示的存储位置,又存储了用户id为203的用户的相关时间信息指针p7、空间信息指针p8、属性1信息指针p9,以及属性2信息指针p10,各指针分别指示对应数据的存储位置。对于用户id为105的用户的属性2列的数据(由指针p6指示)“1,0,0,0”可以采用run-length算法进行编码压缩存储,其中,“1:1”表示1个数值为1的数据,“0:3”表示3个数值为0的数据,对于用户id为203的用户的属性1列的数据(由指针p9指示)“46,51,42”可以采用delta算法进行编码压缩存储,其中,“5”表示与46相邻的数据与46之间的差值为5,“-9”表示与51相邻的数据与51之间的差值为-9。
实施例2:
本发明实施例2中,提供了一种时空数据的索引建立方法,如图5所示,具体包括如下步骤:
S501、将预设时间范围作为根节点,采用如下方法生成多级时间索引树:
将预设时间范围划分为长度相同的预设数量个子时间片,并循环执行如下步骤,直到当前得到的子时间片长度等于所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并将当前得到的子时间片进一步划分为长度相同的预设数量个子时间片;或者
根据预设时间范围内,产生的时空数据在预设时间范围内的分布,将预设时间范围划分为长短不同的子时间片,并循环执行如下步骤,直到当前得到的子时间片长度符合时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并根据当前得到的子时间片内,产生的时空数据在当前得到的子时间片内的分布,将当前得到的子时间片进一步划分为长短不同的子时间片,其中,在当前得到的子时间片内,时空数据分布越密集的时间段划分成的子时间片越多。
进一步地,本步骤中,可以采用二叉树、R树等数据结构对生成的多级时间索引树进行存储。
在对预设时间范围以及后续得到的子时间片进行划分时,可以采用等分的方式。例如:假设预设时间范围为0-10000,并将0-10000作为多级时间索引树的根节点,可以将0-10000平均分为两个部分,即0-5000和5000-10000,并将0-5000和5000-10000作为0-10000的两个子节点,再分别针对0-5000和5000-10000进行等分。直到划分到不可再划分的叶子节点。
还可以采用不等分的方式。例如:假设预设时间范围为0-24(单位为小时),根据统计数据,0-7以及22-24的时间片内时空数据分布较少,那么,在建立多级时间索引树时,可以将0-24划分为0-7、7-10、10-13、13-16、16-19、19-22,并将划分得到的各时间片作为根节点0-24的子节点,再进一步对生成的各子时间片进行划分,以子节点19-22为例,根据统计数据,21-22的时间片内时空数据分布较少,那么,在对子节点19-22进行划分时,可以划分为19-19.5,19.5-20,20-20.5,20.5-21,21-22。直到划分到不可再划分的叶子节点,例如,预先规定叶子节点表征的时间片长度不小于0.5,那么当划分的某一级子 节点表征的时间片长度为0.5时,不再对该子节点进行进一步划分,将该子节点作为叶子节点。
进一步地,还可以根据用户进行时空数据查询时,输入的查询条件的频度进行多级时间索引树的各级索引的划分。
S502、将预设空间范围作为根节点,采用如下方法生成多级空间索引树:
将预设空间范围划分为长度相同的预设数量个子空间,并循环执行如下步骤,直到当前得到的子空间大小等于所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并将当前得到的子空间进一步划分为长度相同的预设数量个子空间;或者
根据预设空间范围内,产生的时空数据在预设空间范围内的分布,将预设空间范围划分为大小不同的子空间,并循环执行如下步骤,直到当前得到的子空间大小符合所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并根据当前得到的子空间内,产生的时空数据在当前得到的子空间内的分布,将当前得到的子空间进一步划分为大小不同的子空间,其中,在当前得到的子空间内,时空数据分布越密集的空间划分成的子空间越多。
进一步地,本步骤中,可以采用二叉树、四叉树、R树等数据结构对生成的多级空间索引树进行存储。
以四叉树存储多级空间索引树为例,图6a为采用等分的方式对预设空间范围以及后续得到的子空间进行划分示意图。如图6a所示,将预设空间范围601作为四叉树的根节点,并将预设空间范围601四等分,得到四个大小相等的子空间602,作为根节点的子节点,进一步地,对每个子空间602四等分,分别得到四个大小相等的子空间603,并作为对应子节点的子节点,以此类推,直到划分到不可再划分的叶子节点。
图6b为采用不等分的方式对预设空间范围以及后续得到的子空间进行划分示意图。如图6b所示,根据预设空间范围604内产生的时空数据在预设 空间范围604内的分布,由于预设空间范围604内左半部分的时空数据分布比右半部分的时空数据分布密集,将预设空间范围604的左半部分划分成的子空间数量多于右半部分,即划分为大小不同的3个子空间:605、606、和607,并作为预设空间范围604的子节点。针对得到的三个子节点,根据同样的划分原则进一步进行划分,直到划分到不可再划分的叶子节点,例如,预先规定叶子节点表征的子空间大小不小于3平米,那么当划分的某一级子节点表征的子空间大小为3平米时,不再对该子节点进行进一步划分,将该子节点作为叶子节点。
进一步地,还可以根据用户进行时空数据查询时,输入的查询条件的频度进行多级空间索引树的各级索引的划分。
S503、分别为每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,其中,时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
实施例3:
本发明实施例3中,提供了基于本发明实施例提供的一种时空数据查询方法,该时空数据查询方法可以基于上述实施例提供的时空数据的索引建立方法,如图7所示,具体包括如下步骤:
S701、确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点。
进一步地,本步骤中,用户输入的时间查询条件可以与多级时间索引树中的任一节点对应,例如,用户输入时间查询条件可以与整个预设时间范围对应,也可以与一个或多个时间片叶子节点对应。
同理,用户输入的空间查询条件可以与多级空间索引树中的任一节点对应,例如,用户输入空间查询条件可以与整个预设空间范围对应,也可以与一个或多个子空间叶子节点对应。
进一步地,本本步骤中,可以根据时间查询条件和空间查询条件,并行在多级时间索引树和多级空间索引树中进行查找,节约查找时间。
S702、分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点。
本步骤中,当时间查询条件与多级时间索引树中的某时间片叶子节点对应时,可以将该时间片叶子节点确定为本步骤要确定的时间片叶子节点;同理,当空间查询条件与多级空间索引树的某子空间叶子节点对应时,可以将该子空间叶子节点确定为本步骤要确定的子空间叶子节点;
当时间查询条件与多级时间索引树中的某时间片非叶子节点对应时,确定以该时间片非叶子节点为根节点的时间索引子树的所有时间片叶子节点;同理,当空间查询条件与多级空间索引树中的某子空间非叶子节点对应时,确定以该子空间非叶子节点为根节点的空间索引子树的所有子空间叶子节点。
S703、分别确定S702中确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识。
S704、采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果。
本步骤中可以针对每个时间片叶子节点,采用预设散列算法,使该时间片叶子节点的标识分别与每个子空间叶子节点的标识生成一个散列结果,或者针对每个子空间叶子节点,采用预设散列算法,使该子空间叶子节点的标识分别与每个时间片叶子节点的标识生成一个散列结果。
本步骤中,预设散列算法为与建立时空数据索引时使用采用的预设散列算法对应。
S705、根据生成的散列结果,确定该散列结果标识的时空文件的存储位置。
S706、在根据S705中确定的存储位置确定的时空文件存储的时空数据中,确定查询结果。
本实施例中,通过多级空间索引树的查找直接定位到符合空间查询条件 的子空间,通过多级时间索引树的查找,直接定位到符合时空查询条件的时间片,再根据子空间和时间片查找到与该子空间和时间片相关的时空文件,与现有技术中,现将二维的待划分空间范围转换成一维编码数据,在查找时,将查找到的一维编码数据转换成二维空间相比,不会产生冗余时空数据,在对时空数据进行解析时,节省时间,提高了查找速度。
实施例4:
本发明实施例4中,提供了基于本发明实施例提供的时空数据的索引建立方法的时空数据查询方法,如图8所示,具体包括如下步骤:
S801、确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点。
S802、分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点。
S803、分别确定S802中确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识。
S804、采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果。
S805、根据生成的散列结果,确定该散列结果标识的时空文件的存储位置。
S806、当查询条件中还包括除上述时间查询条件和上述空间查询条件之外的其他查询条件,且上述时空文件中存储的数据符合预设分布式查询运算结构时,根据上述时空文件存储的时空数据的数据量,以及上述预设分布式查询运算结构,为时空文件启动对应数量的解析进程。
进一步地,时空数据通常可以用{用户id,子空间id、时间片id,属性1,…属性n}的形式存储,针对这种存储形式的时空数据进行查询时,可以为针对某用户的时空约束查询,也可以为针对多个属性的时空约束的聚合查询。
例如,“查询用户A在时间范围[t1,t2]和空间范围[(x1,y1),r]中的轨迹,其 中,空间范围[(x1,y1),r]为以(x1,y1)为圆心,以r为半径的空间范围”为一种针对某用户的时空约束查询;“查询所有用户在时间范围[t1,t2]和空间范围[(x1,y1),r]中的使用手机应用程序的分布”为一种时空约束的聚合查询。这两类查询都需要对时间和空间信息的索引。
当查询条件中仅包括时间查询条件和空间查询条件时,可以将确定的时空文件中存储的全部时空数据,确定为查询结果;当查询条件中还包括除上述时间查询条件和上述空间查询条件之外的其他查询条件时,还需要对确定的时空文件进行进一步解析,获取符合该其他查询条件的数据。当时空文件存储的时空数据量很大时,对时空文件的解析会非常耗时,使得查询速度减慢。本发明实施例中,在对时空数据进行存储时,将时空文件存储在分布式文件系统中,并实现了时空数据的分布式存储,使时空文件中存储的数据符合预设分布式查询运算结构,因此,在进行时空文件解析时,可以根据时空文件存储的时空数据的数据量,以及预设分布式查询运算结构,为时空文件启动对应数量的解析进程。
S807、根据上述其他查询条件,使对应数量的解析进程并行对时空文件中存储的时空数据进行解析,得到符合上述其他查询条件的解析结果。
本步骤中,可以根据时空文件中存储的时空数据的数据量,将时空数据拆分成多个部分,并为每个部分启动一个解析进程,并行对时空数据进行解析。这种分布式存储方式能够使得对时空数据进行查询时,使用相应的分布式查询,提高查询速度。
例如,假设将时空文件存储在(HDFS,Hadoop Distributed File System)文件系统中,可以为待解析的时空文件启动(MR,MapReduce)进程,对时空文件存储的时空数据进行分布式计算,加快了查询速度。
S808、将S807中得到的解析结果汇总,并确定为查询结果。
本步骤中,将分布式文件系统中多个解析进程解析得到的解析结果进行汇总,将汇总的解析结果确定为最终查询结果并反馈给用户。
基于同一发明构思,本发明实施例还提供了一种装置及设备,由于这些 装置和设备所解决问题的原理与前述一种时空数据的索引建立方法或者一种时空数据查询方法相似,因此该装置和设备的实施可以参见前述方法的实施,重复之处不再赘述。
本发明实施例提供的一种时空数据的索引建立装置,如图9所示,包括如下模块:
第一生成模块901,用于将预设时间范围作为根节点生成多级时间索引树;其中,所述多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长;
第二生成模块902,用于将预设空间范围作为根节点生成多级空间索引树;其中,所述多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大;
映射模块903,用于分别将所述第一生成模块901生成的多级时间索引树的每个时间片叶子节点与所述第二生成模块902生成的多级空间索引树的每个子空间叶子节点映射一个时空文件;其中,所述时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
进一步地,所述映射模块903,具体用于分别确定每个时间片叶子节点的标识,以及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;并将得到的散列结果确定为时空文件的标识,其中,所述时空文件为该散列结果对应的时间片叶子节点所表征的时间片,以及子空间叶子节点所表征的子空间映射的时空文件。
进一步地,所述装置,还包括:存储模块904;
所述存储模块904,用于向所述时空文件中存储待存储的时空数据。
进一步地,所述存储模块904,具体用于基于预设类目的预设排序方式,确定所述待存储的时空数据的存储顺序;按照确定的存储顺序,将该时空文件中属于同一类目且具有相同数据格式的待存储的时空数据,采用第一算法 进行编码压缩存储;并将该时空文件中属于同一类目且具有相同数值的待存储的时空数据,采用第二算法进行编码压缩存储,并使存储后的待存储的时空数据符合预设分布式查询运算结构。
进一步地,所述第一生成模块901,具体用于将预设时间范围作为根节点,采用如下方法生成多级时间索引树:将所述预设时间范围划分为长度相同的预设数量个子时间片,并循环执行如下步骤,直到当前得到的子时间片长度等于所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并将所述当前得到的子时间片进一步划分为长度相同的预设数量个子时间片;或者
根据所述预设时间范围内,产生的时空数据在所述预设时间范围内的分布,将所述预设时间范围划分为长短不同的子时间片,并循环执行如下步骤,直到当前得到的子时间片长度符合所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并根据所述当前得到的子时间片内,产生的时空数据在所述当前得到的子时间片内的分布,将所述当前得到的子时间片进一步划分为长短不同的子时间片,其中,在所述当前得到的子时间片内,时空数据分布越密集的时间段划分成的子时间片越多。
进一步地,所述第二生成模块902,具体用于将预设空间范围作为根节点,采用如下方法生成多级空间索引树:将所述预设空间范围划分为长度相同的预设数量个子空间,并循环执行如下步骤,直到当前得到的子空间大小等于所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并将所述当前得到的子空间进一步划分为长度相同的预设数量个子空间;或者
根据所述预设空间范围内,产生的时空数据在所述预设空间范围内的分布,将所述预设空间范围划分为大小不同的子空间,并循环执行如下步骤,直到当前得到的子空间大小符合所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点; 并根据所述当前得到的子空间内,产生的时空数据在所述当前得到的子空间内的分布,将所述当前得到的子空间进一步划分为大小不同的子空间,其中,在所述当前得到的子空间内,时空数据分布越密集的空间划分成的子空间越多。
本发明实施例提供的一种时空数据查询装置,如图10所示,包括如下模块:
节点确定模块1001,用于确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;
叶子节点确定模块1002,用于分别确定以所述节点确定模块1001确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以所述节点确定模块1001确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点;
查询结果确定模块1003,用于在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。
进一步地,所述查询结果确定模块1003,具体用于分别确定所述确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;根据生成的散列结果,确定所述散列结果标识的时空文件的存储位置;并在根据所述存储位置确定的时空文件存储的时空数据中,确定查询结果。
进一步地,所述查询结果确定模块1003,具体用于当查询条件中还包括除所述时间查询条件和所述空间查询条件之外的其他查询条件,且所述时空文件中存储的数据符合预设分布式查询运算结构时,根据所述时空文件存储的时空数据的数据量,以及所述预设分布式查询运算结构,为所述时空文件启动对应数量的解析进程;根据所述其他查询条件,使所述对应数量的解析进程并行对所述时空文件中存储的时空数据进行解析,得到符合所述其他查询条件的解析结果;将得到的所述解析结果汇总,并确定为查询结果。
本发明实施例提供的一种时空数据的索引建立设备,包括:上述的时空数据的索引建立装置。
本发明实施例提供的一种时空数据查询设备,包括:上述的时空数据查询装置。
上述各单元的功能可对应于图1至图3、图5、图7-图8所示流程中的相应处理步骤,在此不再赘述。
本发明实施例提供的一种时空数据的索引建立方法中,将预设时间范围作为根节点生成多级时间索引树;并将预设空间范围作为根节点生成多级空间索引树;分别为每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,其中,时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。相应地,本发明实施例提供的基于上述一种时空数据的索引建立方法的时空数据查询方法,包括:确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点;根据确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件中存储的时空数据,确定查询结果。可见,基于本发明实施例提供的时空数据的索引建立方法,建立的时空数据索引,在查找时空数据时,可以对时间查询条件和空间查询条件进行并行查询,并根据查询到的子空间叶子节点和时间片叶子节点,对需要查询的时空数据直接索引,与现有技术中通过二次索引技术查询时空数据相比,提高了查询效率。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明实施例可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本发明实施例的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个 人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
本领域技术人员可以理解附图只是一个优选实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。
本领域技术人员可以理解实施例中的装置中的模块可以按照实施例描述进行分布于实施例的装置中,也可以进行相应变化位于不同于本实施例的一个或多个装置中。上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (20)

  1. 一种时空数据的索引建立方法,其特征在于,包括:
    将预设时间范围作为根节点生成多级时间索引树;其中,所述多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长;
    将预设空间范围作为根节点生成多级空间索引树;其中,所述多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大;
    分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件;其中,所述时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
  2. 如权利要求1所述的方法,其特征在于,分别将每个时间片叶子节点与每个子空间叶子节点映射一个时空文件,具体包括:
    分别确定每个时间片叶子节点的标识,以及每个子空间叶子节点的标识;
    采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;并
    将得到的散列结果确定为时空文件的标识,其中,所述时空文件为该散列结果对应的时间片叶子节点所表征的时间片,以及子空间叶子节点所表征的子空间映射的时空文件。
  3. 如权利要求1或2所述的方法,其特征在于,还包括:
    向所述时空文件中存储待存储的时空数据。
  4. 如权利要求3所述的方法,其特征在于,向所述时空文件中存储待存储的时空数据,具体包括:
    基于预设类目的预设排序方式,确定所述待存储的时空数据的存储顺序;
    按照确定的存储顺序,将该时空文件中属于同一类目且具有相同数据格式的待存储的时空数据,采用第一算法进行编码压缩存储;并
    将该时空文件中属于同一类目且具有相同数值的待存储的时空数据,采用第二算法进行编码压缩存储,并使存储后的待存储的时空数据符合预设分 布式查询运算结构。
  5. 如权利要求1或2所述的方法,其特征在于,将预设时间范围作为根节点,采用如下方法生成多级时间索引树:
    将所述预设时间范围划分为长度相同的预设数量个子时间片,并循环执行如下步骤,直到当前得到的子时间片长度等于所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并将所述当前得到的子时间片进一步划分为长度相同的预设数量个子时间片;或者
    根据所述预设时间范围内,产生的时空数据在所述预设时间范围内的分布,将所述预设时间范围划分为长短不同的子时间片,并循环执行如下步骤,直到当前得到的子时间片长度符合所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并根据所述当前得到的子时间片内,产生的时空数据在所述当前得到的子时间片内的分布,将所述当前得到的子时间片进一步划分为长短不同的子时间片,其中,在所述当前得到的子时间片内,时空数据分布越密集的时间段划分成的子时间片越多。
  6. 如权利要求1或2所述的方法,其特征在于,将预设空间范围作为根节点,采用如下方法生成多级空间索引树:
    将所述预设空间范围划分为长度相同的预设数量个子空间,并循环执行如下步骤,直到当前得到的子空间大小等于所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并将所述当前得到的子空间进一步划分为长度相同的预设数量个子空间;或者
    根据所述预设空间范围内,产生的时空数据在所述预设空间范围内的分布,将所述预设空间范围划分为大小不同的子空间,并循环执行如下步骤,直到当前得到的子空间大小符合所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点; 并根据所述当前得到的子空间内,产生的时空数据在所述当前得到的子空间内的分布,将所述当前得到的子空间进一步划分为大小不同的子空间,其中,在所述当前得到的子空间内,时空数据分布越密集的空间划分成的子空间越多。
  7. 一种时空数据查询方法,其特征在于,包括:
    确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;
    分别确定以确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点;
    在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。
  8. 如权利要求7所述的方法,其特征在于,在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件中存储的时空数据中,确定查询结果,具体包括:
    分别确定所述确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识;
    采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;
    根据生成的散列结果,确定所述散列结果标识的时空文件的存储位置;并
    在根据所述存储位置确定的时空文件存储的时空数据中,确定查询结果。
  9. 如权利要求7-8所述的方法,其特征在于,当查询条件中还包括除所述时间查询条件和所述空间查询条件之外的其他查询条件,且所述时空文件中存储的数据符合预设分布式查询运算结构时,在所述时空文件存储的时空数据中,确定查询结果,具体包括:
    根据所述时空文件存储的时空数据的数据量,以及所述预设分布式查询 运算结构,为所述时空文件启动对应数量的解析进程;
    根据所述其他查询条件,使所述对应数量的解析进程并行对所述时空文件中存储的时空数据进行解析,得到符合所述其他查询条件的解析结果;
    将得到的所述解析结果汇总,并确定为查询结果。
  10. 一种时空数据的索引建立装置,其特征在于,包括:
    第一生成模块,用于将预设时间范围作为根节点生成多级时间索引树;其中,所述多级时间索引树包含多个时间片节点,且越靠近根节点的时间片节点表征的时间片越长;
    第二生成模块,用于将预设空间范围作为根节点生成多级空间索引树;其中,所述多级空间索引树包含多个子空间节点,且越靠近根节点的子空间节点表征的子空间越大;
    映射模块,用于分别将所述第一生成模块生成的多级时间索引树的每个时间片叶子节点与所述第二生成模块生成的多级空间索引树的每个子空间叶子节点映射一个时空文件;其中,所述时空文件用于存储与该时空文件具有映射关系的时间片叶子节点表征的时间片及子空间叶子节点表征的子空间对应的时空数据。
  11. 如权利要求10所述的装置,其特征在于,所述映射模块,具体用于分别确定每个时间片叶子节点的标识,以及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;并将得到的散列结果确定为时空文件的标识,其中,所述时空文件为该散列结果对应的时间片叶子节点所表征的时间片,以及子空间叶子节点所表征的子空间映射的时空文件。
  12. 如权利要求10或11所述的装置,其特征在于,还包括:存储模块;
    所述存储模块,用于向所述时空文件中存储待存储的时空数据。
  13. 如权利要求12所述的装置,其特征在于,所述存储模块,具体用于基于预设类目的预设排序方式,确定所述待存储的时空数据的存储顺序;按照确定的存储顺序,将该时空文件中属于同一类目且具有相同数据格式的待 存储的时空数据,采用第一算法进行编码压缩存储;并将该时空文件中属于同一类目且具有相同数值的待存储的时空数据,采用第二算法进行编码压缩存储,并使存储后的待存储的时空数据符合预设分布式查询运算结构。
  14. 如权利要求10或11所述的装置,其特征在于,所述第一生成模块,具体用于将预设时间范围作为根节点,采用如下方法生成多级时间索引树:将所述预设时间范围划分为长度相同的预设数量个子时间片,并循环执行如下步骤,直到当前得到的子时间片长度等于所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并将所述当前得到的子时间片进一步划分为长度相同的预设数量个子时间片;或者
    根据所述预设时间范围内,产生的时空数据在所述预设时间范围内的分布,将所述预设时间范围划分为长短不同的子时间片,并循环执行如下步骤,直到当前得到的子时间片长度符合所述时间片叶子节点表征的时间片长度为止:将当前得到的子时间片作为该子时间片的父时间片对应的时间片节点的子节点;并根据所述当前得到的子时间片内,产生的时空数据在所述当前得到的子时间片内的分布,将所述当前得到的子时间片进一步划分为长短不同的子时间片,其中,在所述当前得到的子时间片内,时空数据分布越密集的时间段划分成的子时间片越多。
  15. 如权利要求10或11所述的装置,其特征在于,所述第二生成模块,具体用于将预设空间范围作为根节点,采用如下方法生成多级空间索引树:将所述预设空间范围划分为长度相同的预设数量个子空间,并循环执行如下步骤,直到当前得到的子空间大小等于所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并将所述当前得到的子空间进一步划分为长度相同的预设数量个子空间;或者
    根据所述预设空间范围内,产生的时空数据在所述预设空间范围内的分布,将所述预设空间范围划分为大小不同的子空间,并循环执行如下步骤, 直到当前得到的子空间大小符合所述子空间叶子节点表征的子空间大小为止:将当前得到的子空间作为该子空间的父空间对应的子空间节点的子节点;并根据所述当前得到的子空间内,产生的时空数据在所述当前得到的子空间内的分布,将所述当前得到的子空间进一步划分为大小不同的子空间,其中,在所述当前得到的子空间内,时空数据分布越密集的空间划分成的子空间越多。
  16. 一种时空数据查询装置,其特征在于,包括:
    节点确定模块,用于确定时间查询条件在多级时间索引树中对应的时间片节点,以及空间查询条件在多级空间索引树中对应的子空间节点;
    叶子节点确定模块,用于分别确定以所述节点确定模块确定的时间片节点为根节点的时间索引子树的所有时间片叶子节点,以及以所述节点确定模块确定的子空间节点为根节点的空间索引子树的所有子空间叶子节点;
    查询结果确定模块,用于在确定的每个时间片叶子节点与每个子空间叶子节点映射的时空文件存储的时空数据中,确定查询结果。
  17. 如权利要求16所述的装置,其特征在于,所述查询结果确定模块,具体用于分别确定所述确定的每个时间片叶子节点的标识,及每个子空间叶子节点的标识;采用预设散列算法,使每个时间片叶子节点的标识与每个子空间叶子节点的标识生成一个散列结果;根据生成的散列结果,确定所述散列结果标识的时空文件的存储位置;并在根据所述存储位置确定的时空文件存储的时空数据中,确定查询结果。
  18. 如权利要求16-17所述的装置,其特征在于,所述查询结果确定模块,具体用于当查询条件中还包括除所述时间查询条件和所述空间查询条件之外的其他查询条件,且所述时空文件中存储的数据符合预设分布式查询运算结构时,根据所述时空文件存储的时空数据的数据量,以及所述预设分布式查询运算结构,为所述时空文件启动对应数量的解析进程;根据所述其他查询条件,使所述对应数量的解析进程并行对所述时空文件中存储的时空数据进行解析,得到符合所述其他查询条件的解析结果;将得到的所述解析结果汇 总,并确定为查询结果。
  19. 一种时空数据的索引建立设备,其特征在于,包括:如权利要求10-15任一项所述的时空数据的索引建立装置。
  20. 一种时空数据查询设备,其特征在于,包括:如权利要求16-18任一项所述的时空数据查询装置。
PCT/CN2014/092256 2013-12-27 2014-11-26 一种时空数据的索引建立方法、查询方法、装置及设备 WO2015096582A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310740491.9 2013-12-27
CN201310740491.9A CN104750708B (zh) 2013-12-27 2013-12-27 一种时空数据的索引建立方法、查询方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2015096582A1 true WO2015096582A1 (zh) 2015-07-02

Family

ID=53477510

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/092256 WO2015096582A1 (zh) 2013-12-27 2014-11-26 一种时空数据的索引建立方法、查询方法、装置及设备

Country Status (2)

Country Link
CN (1) CN104750708B (zh)
WO (1) WO2015096582A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150222588A1 (en) * 2014-02-06 2015-08-06 Electronics And Telecommunications Research Institute Apparatus and method for sharing experience of communication terminal user
WO2018118071A1 (en) 2016-12-22 2018-06-28 Intel Corporation Methods, systems and apparatus to improve spatial-temporal data management
CN110928968A (zh) * 2019-11-12 2020-03-27 天津大学 一种二维地理空间大数据的存储与查询计算机介质
CN110990665A (zh) * 2019-12-11 2020-04-10 北京明略软件系统有限公司 数据处理方法、装置、系统、电子设备及存储介质
CN111078634A (zh) * 2019-12-30 2020-04-28 中科海拓(无锡)科技有限公司 一种基于r树的分布式时空数据索引方法
CN111367916A (zh) * 2020-03-04 2020-07-03 浙江大华技术股份有限公司 一种数据存储方法及装置
CN111723096A (zh) * 2020-06-23 2020-09-29 重庆市计量质量检测研究院 一种集成GeoHash和Quadtree的空间数据索引方法
CN113076334A (zh) * 2020-01-06 2021-07-06 阿里巴巴集团控股有限公司 数据查询方法、索引生成方法、装置及电子设备
CN113742350A (zh) * 2021-09-09 2021-12-03 北京中安智能信息科技有限公司 基于机器学习模型的时空索引构建方法和装置及查询方法
CN113760937A (zh) * 2021-09-17 2021-12-07 恒生电子股份有限公司 数据的查缺方法、装置、电子设备及存储介质
US20230079719A1 (en) * 2021-09-15 2023-03-16 Henan University Geotagged video spatial indexing method based on temporal information
CN115809360A (zh) * 2023-02-08 2023-03-17 深圳大学 一种大规模时空流数据实时空间连接查询方法及相关设备
CN116501504A (zh) * 2023-06-27 2023-07-28 上海燧原科技有限公司 数据流的时空映射方法、装置、电子设备及存储介质
CN116756139A (zh) * 2023-05-12 2023-09-15 中国自然资源航空物探遥感中心 一种数据索引方法、系统、存储介质和电子设备
CN116796083A (zh) * 2023-06-29 2023-09-22 山东省国土测绘院 一种空间数据划分方法及系统
CN117112492A (zh) * 2023-08-25 2023-11-24 中南林业科技大学 一种自适应的时空大数据分布式存储方法及智能文件系统
CN117290617A (zh) * 2023-08-18 2023-12-26 中国船舶集团有限公司第七〇九研究所 一种海上分布式多源异构时空数据查询方法及系统

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262012B2 (en) * 2015-08-26 2019-04-16 Oracle International Corporation Techniques related to binary encoding of hierarchical data objects to support efficient path navigation of the hierarchical data objects
CN105426491B (zh) * 2015-11-23 2018-12-14 武汉大学 一种时空地理大数据的检索方法及系统
CN106776632A (zh) * 2015-11-23 2017-05-31 北京国双科技有限公司 数据查询方法及装置
CN105488189B (zh) * 2015-12-02 2019-02-12 成都科来软件有限公司 一种基于大数据量的五元组查询方法及装置
CN106202137A (zh) * 2015-12-29 2016-12-07 北京市交通信息中心 一种车辆定位方法及装置
CN106020724A (zh) * 2016-05-20 2016-10-12 南京邮电大学 一种基于数据映射算法的近邻存储方法
CN106095952A (zh) * 2016-06-15 2016-11-09 公安部第三研究所 基于键值云存储的时空范围内海量过车记录快速查询方法
CN108614836A (zh) * 2016-12-13 2018-10-02 上海仪电(集团)有限公司中央研究院 一种基于Hbase的时空数据管理方法
CN106815355A (zh) * 2017-01-22 2017-06-09 济南浪潮高新科技投资发展有限公司 一种用于NANDflash阵列的文件系统实现方法
CN107220285B (zh) * 2017-04-24 2020-01-21 中国科学院计算技术研究所 面向海量轨迹点数据的时空索引构建方法
CN109120885B (zh) * 2017-06-26 2021-01-05 杭州海康威视数字技术股份有限公司 视频数据获取方法及装置
CN107423368B (zh) * 2017-06-29 2020-07-17 中国测绘科学研究院 一种非关系数据库中的时空数据索引方法
CN109241121A (zh) * 2017-06-29 2019-01-18 阿里巴巴集团控股有限公司 时间序列数据的存储和查询方法、装置、系统及电子设备
CN107391600A (zh) * 2017-06-30 2017-11-24 北京百度网讯科技有限公司 用于在内存中存取时序数据的方法和装置
CN108717417B (zh) * 2018-03-30 2022-05-03 斑马网络技术有限公司 地图检索输入提示方法及其系统
CN108920499B (zh) * 2018-05-24 2022-04-19 河海大学 一种面向周期性检索的时空轨迹索引与检索方法
CN108595720B (zh) * 2018-07-12 2020-05-19 中国科学院深圳先进技术研究院 一种区块链时空数据查询方法、系统及电子设备
CN108924778B (zh) * 2018-07-16 2020-05-22 浙江大学 一种面向非实时快照位置数据的签到用户近似搜索方法
CN110737727B (zh) 2018-07-19 2023-09-29 华为云计算技术有限公司 一种数据处理的方法及系统
CN109165215B (zh) * 2018-07-27 2020-07-28 苏州视锐信息科技有限公司 一种云环境下时空索引的构建方法、装置及电子设备
CN109325086B (zh) * 2018-08-10 2021-01-26 中国电子科技集团公司第二十八研究所 一种离散地理数据归档管理方法
CN109284338B (zh) * 2018-10-25 2021-12-10 南京航空航天大学 一种基于混合索引的卫星遥感大数据优化查询方法
CN109933584B (zh) * 2019-01-31 2021-04-02 北京大学 一种多级无序索引方法与系统
WO2020199192A1 (en) * 2019-04-04 2020-10-08 Alibaba Group Holding Limited Split-key estimation method for table partition in disbtributed data storage systems
CN110765321B (zh) * 2019-10-28 2022-10-25 北京明略软件系统有限公司 一种数据存储路径的生成方法、生成装置及可读存储介质
CN111104457A (zh) * 2019-10-30 2020-05-05 武汉大学 基于分布式数据库的海量时空数据管理方法
CN113326257B (zh) * 2020-04-30 2023-12-15 阿里巴巴集团控股有限公司 索引构建方法、推荐方法、装置、电子设备和计算机存储介质
CN113763099A (zh) * 2020-12-29 2021-12-07 京东城市(北京)数字科技有限公司 一种数据查找方法、装置、设备和存储介质
CN112948531B (zh) * 2021-04-02 2023-12-15 方正国际软件(北京)有限公司 海量轨迹查询方法、检索服务器及系统
CN113094756A (zh) * 2021-05-13 2021-07-09 统信软件技术有限公司 一种数据加密方法及计算设备
CN113704565B (zh) * 2021-10-28 2022-02-18 浙江大学 基于全局区间误差的学习型时空索引方法、装置及介质
CN116882522B (zh) * 2023-09-07 2023-11-28 湖南视觉伟业智能科技有限公司 一种分布式时空挖掘方法及系统
CN117033541B (zh) * 2023-10-09 2023-12-19 中南大学 一种时空知识图谱索引方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030208496A1 (en) * 2002-05-01 2003-11-06 Sun Microsystems, Inc. Shape-based geometric database and methods and systems for construction and use thereof
CN102385552A (zh) * 2010-08-25 2012-03-21 微软公司 样本剖析报告的动态计算
CN102479189A (zh) * 2010-11-23 2012-05-30 上海宝信软件股份有限公司 一种内存中海量时间戳型数据高速均匀访问的索引方法
CN103294790A (zh) * 2013-05-22 2013-09-11 西北工业大学 一种面向gps轨迹数据的时空语义索引与检索方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102651020B (zh) * 2012-03-31 2014-01-15 中国科学院软件研究所 一种海量传感器数据存储与查询方法
CN102955861B (zh) * 2012-11-30 2017-04-12 华为技术有限公司 一种基于备份文件的索引文件生成方法和装置
CN103092927B (zh) * 2012-12-29 2016-01-20 华中科技大学 一种分布式环境下的文件快速读写方法
CN103412897B (zh) * 2013-07-25 2017-03-01 中国科学院软件研究所 一种基于分布式结构的并行数据处理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030208496A1 (en) * 2002-05-01 2003-11-06 Sun Microsystems, Inc. Shape-based geometric database and methods and systems for construction and use thereof
CN102385552A (zh) * 2010-08-25 2012-03-21 微软公司 样本剖析报告的动态计算
CN102479189A (zh) * 2010-11-23 2012-05-30 上海宝信软件股份有限公司 一种内存中海量时间戳型数据高速均匀访问的索引方法
CN103294790A (zh) * 2013-05-22 2013-09-11 西北工业大学 一种面向gps轨迹数据的时空语义索引与检索方法

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150222588A1 (en) * 2014-02-06 2015-08-06 Electronics And Telecommunications Research Institute Apparatus and method for sharing experience of communication terminal user
US11860846B2 (en) 2016-12-22 2024-01-02 Intel Corporation Methods, systems and apparatus to improve spatial-temporal data management
CN109997123B (zh) * 2016-12-22 2023-12-05 英特尔公司 用于改进空间-时间数据管理的方法、系统和装置
US10990570B2 (en) 2016-12-22 2021-04-27 Intel Corporation Methods, systems and apparatus to improve spatial-temporal data management
EP3559824A4 (en) * 2016-12-22 2020-05-06 INTEL Corporation METHODS, SYSTEMS AND DEVICES FOR IMPROVING THE SPATIAL-TIME DATA MANAGEMENT
CN109997123A (zh) * 2016-12-22 2019-07-09 英特尔公司 用于改进空间-时间数据管理的方法、系统和装置
WO2018118071A1 (en) 2016-12-22 2018-06-28 Intel Corporation Methods, systems and apparatus to improve spatial-temporal data management
CN110928968A (zh) * 2019-11-12 2020-03-27 天津大学 一种二维地理空间大数据的存储与查询计算机介质
CN110928968B (zh) * 2019-11-12 2023-04-11 天津大学 一种二维地理空间大数据的存储与查询计算机介质
CN110990665A (zh) * 2019-12-11 2020-04-10 北京明略软件系统有限公司 数据处理方法、装置、系统、电子设备及存储介质
CN110990665B (zh) * 2019-12-11 2023-08-25 北京明略软件系统有限公司 数据处理方法、装置、系统、电子设备及存储介质
CN111078634B (zh) * 2019-12-30 2023-07-25 中科海拓(无锡)科技有限公司 一种基于r树的分布式时空数据索引方法
CN111078634A (zh) * 2019-12-30 2020-04-28 中科海拓(无锡)科技有限公司 一种基于r树的分布式时空数据索引方法
CN113076334B (zh) * 2020-01-06 2024-05-03 阿里巴巴集团控股有限公司 数据查询方法、索引生成方法、装置及电子设备
CN113076334A (zh) * 2020-01-06 2021-07-06 阿里巴巴集团控股有限公司 数据查询方法、索引生成方法、装置及电子设备
CN111367916A (zh) * 2020-03-04 2020-07-03 浙江大华技术股份有限公司 一种数据存储方法及装置
CN111367916B (zh) * 2020-03-04 2023-03-31 浙江大华技术股份有限公司 一种数据存储方法及装置
CN111723096A (zh) * 2020-06-23 2020-09-29 重庆市计量质量检测研究院 一种集成GeoHash和Quadtree的空间数据索引方法
CN111723096B (zh) * 2020-06-23 2022-08-05 重庆市计量质量检测研究院 一种集成GeoHash和Quadtree的空间数据索引方法
CN113742350A (zh) * 2021-09-09 2021-12-03 北京中安智能信息科技有限公司 基于机器学习模型的时空索引构建方法和装置及查询方法
US20230079719A1 (en) * 2021-09-15 2023-03-16 Henan University Geotagged video spatial indexing method based on temporal information
US11681753B2 (en) * 2021-09-15 2023-06-20 Henan University Geotagged video spatial indexing method based on temporal information
CN113760937A (zh) * 2021-09-17 2021-12-07 恒生电子股份有限公司 数据的查缺方法、装置、电子设备及存储介质
CN115809360A (zh) * 2023-02-08 2023-03-17 深圳大学 一种大规模时空流数据实时空间连接查询方法及相关设备
CN116756139A (zh) * 2023-05-12 2023-09-15 中国自然资源航空物探遥感中心 一种数据索引方法、系统、存储介质和电子设备
CN116756139B (zh) * 2023-05-12 2024-04-23 中国自然资源航空物探遥感中心 一种数据索引方法、系统、存储介质和电子设备
CN116501504B (zh) * 2023-06-27 2023-09-12 上海燧原科技有限公司 数据流的时空映射方法、装置、电子设备及存储介质
CN116501504A (zh) * 2023-06-27 2023-07-28 上海燧原科技有限公司 数据流的时空映射方法、装置、电子设备及存储介质
CN116796083B (zh) * 2023-06-29 2023-12-22 山东省国土测绘院 一种空间数据划分方法及系统
CN116796083A (zh) * 2023-06-29 2023-09-22 山东省国土测绘院 一种空间数据划分方法及系统
CN117290617A (zh) * 2023-08-18 2023-12-26 中国船舶集团有限公司第七〇九研究所 一种海上分布式多源异构时空数据查询方法及系统
CN117290617B (zh) * 2023-08-18 2024-05-10 中国船舶集团有限公司第七〇九研究所 一种海上分布式多源异构时空数据查询方法及系统
CN117112492A (zh) * 2023-08-25 2023-11-24 中南林业科技大学 一种自适应的时空大数据分布式存储方法及智能文件系统
CN117112492B (zh) * 2023-08-25 2024-03-12 中南林业科技大学 一种自适应的时空大数据分布式存储方法及智能文件系统

Also Published As

Publication number Publication date
CN104750708A (zh) 2015-07-01
CN104750708B (zh) 2018-09-28

Similar Documents

Publication Publication Date Title
WO2015096582A1 (zh) 一种时空数据的索引建立方法、查询方法、装置及设备
Li et al. Compression of uncertain trajectories in road networks
US20150370838A1 (en) Index structure to accelerate graph traversal
Crainiceanu et al. Bloofi: Multidimensional bloom filters
Wandelt et al. Efficient compression of 4D-trajectory data in air traffic management
Whitman et al. Distributed spatial and spatio-temporal join on apache spark
WO2013143278A1 (zh) 数据的索引查询方法、装置及系统
WO2023143096A1 (zh) 数据查询方法、装置、设备及存储介质
US20220284025A1 (en) Indexed geospatial predicate search
Whitman et al. Spatio-temporal join on apache spark
WO2022105372A1 (zh) 时空关联数据的查询方法、装置、电子设备和存储介质
US20200364269A1 (en) Fingerprints for compressed columnar data search
Jung et al. QR-tree: An efficient and scalable method for evaluation of continuous range queries
Du et al. Spatio-temporal data index model of moving objects on fixed networks using hbase
Zhang et al. Augmented keyword search on spatial entity databases
Jing et al. Energy-efficient shortest path query processing on air
Guzun et al. Hybrid query optimization for hard-to-compress bit-vectors
Vu et al. R*-grove: Balanced spatial partitioning for large-scale datasets
Van Le et al. A scalable spatio-temporal data storage for intelligent transportation systems based on HBase
Crume et al. Compressing intermediate keys between mappers and reducers in scihadoop
Zhang et al. Storing and querying semi-structured spatio-temporal data in hbase
Eom et al. Efficient generation of spatiotemporal relationships from spatial data streams and static data
CN114048219A (zh) 图数据库更新方法及装置
Nidzwetzki et al. BBoxDB streams: scalable processing of multi-dimensional data streams
Zhu et al. Parallelization of skyline probability computation over uncertain preferences

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873240

Country of ref document: EP

Kind code of ref document: A1