CN110334290B - MF-Octree-based spatio-temporal data rapid retrieval method - Google Patents

MF-Octree-based spatio-temporal data rapid retrieval method Download PDF

Info

Publication number
CN110334290B
CN110334290B CN201910576241.3A CN201910576241A CN110334290B CN 110334290 B CN110334290 B CN 110334290B CN 201910576241 A CN201910576241 A CN 201910576241A CN 110334290 B CN110334290 B CN 110334290B
Authority
CN
China
Prior art keywords
node
time
octree
spatio
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910576241.3A
Other languages
Chinese (zh)
Other versions
CN110334290A (en
Inventor
龙军
陈瑞鹏
杨展
陈刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201910576241.3A priority Critical patent/CN110334290B/en
Publication of CN110334290A publication Critical patent/CN110334290A/en
Application granted granted Critical
Publication of CN110334290B publication Critical patent/CN110334290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Abstract

The invention discloses a space-time data rapid retrieval method based on MF-Octree, which comprises the following steps: step 1, establishing a time axis with time as a reference; step 2, when newly arrived spatio-temporal data are received, storing the newly arrived spatio-temporal data in an octree based on a Z-order curve; the octree is sequentially positioned on the corresponding time period of the time shaft according to the time attribute of the stored space-time data; step 3, when receiving the query request, finding out octree root nodes which accord with corresponding time attributes on a time axis according to the time attributes of the query request; and 4, calculating the relevance ranking score of each node in the octree to which the root node belongs, and taking the node of which the relevance ranking score value is smaller than a preset relevance ranking score standard value as a query result and outputting the query result. The invention can meet the real-time requirement of the user on the spatio-temporal data retrieval, reduce the query response time and effectively improve the user experience of the retrieval system.

Description

MF-Octree-based spatio-temporal data rapid retrieval method
Technical Field
The invention relates to the technical field of Data Retrieval methods (DR), in particular to a space-time Data fast Retrieval method based on MF-Octree.
Background
Data is an extension of cloud computing technology and is a necessary result of social progress and development, and the coming of a big data era leads strategic trends of future IT technology development. Today, with the rapid development of information and network technologies, more and more enterprise businesses and social activities are digitized, and particularly, with the automation of data generation and the increase of data generation speed, the data volume is rapidly increased, wherein sensor data is one of the main sources of big data. In the era of internet of things, thousands of network sensors are embedded into physical devices such as smart meters, mobile phones and automobiles, and the novel data such as ultra-large-scale geographic position information, time dimension information, temperature, humidity and other object attribute information are sensed, generated and transmitted continuously. By way of example, nowadays more and more people use social software like Twitter, Facebook, QQ, etc. that is used to transfer and share information by uploading text, images, audio-video and other forms of files. iiMedia Research data shows that Facebook active users in the last half of 2017 reach 21.0 hundred million people at the head of each social software, Youtube and whatsAPP follow, the amount of twitter users also reaches 5.7 hundred million, and the total number of WeChat and QQ users in China is 9.8 hundred million and 8.4 hundred million respectively. The widespread use of these internet products has led to a flood of data, with statistics that Facebook users may share 216302 photos per minute, Youtube registered users may analyze new videos for perhaps 400 hours per minute, Instagram users may upload 2430555posts per minute, and twitter users may send 9678 twets to each other per minute. The analysis of the data has high application value, such as target detection, key data information mining, public opinion analysis, recommendation system and the like. In addition, browsing and reviewing of these data are also important matters of user interest, and users often wish to acquire valuable data by tracking and monitoring the evolution of these object data in the time-space dimension, which is a common spatio-temporal range data retrieval.
A number of space-time keyword search techniques have evolved rapidly and have improved further. The current existing time-space keyword retrieval technology mainly aims at the retrieval problem of structured data (common structured databases are Oracle, Sybase, SQLServer, DB2, Informix and the like) and unstructured data (full text, images, sound, movies, hypermedia and the like). The research on the structured data focuses on an index structure developed based on a structured database, which is used for accelerating the query speed, and common indexes such as a quadtree and an R tree are usually improved and expanded. Unstructured databases, also known as NoSQL databases such as Hbase, provide basic data manipulation (e.g., Get, Scan) and support only a single data retrieval mode, which limits the application of spatio-temporal query techniques to large data fields. This search requirement is not achievable by conventional Hbase, since spatiotemporal data search requires the return of data objects at a particular temporal and spatial extent. Meanwhile, in practical application, the retrieved data object usually requires high data update rate and real-time multi-attribute query, and although many methods consider spatio-temporal conditions, the space-temporal retrieval application with more diversity of spatio-temporal data cannot be satisfied.
Disclosure of Invention
Aiming at the key problem of how to greatly shorten the retrieval time in the retrieval of massive space-time data, the invention provides a novel retrieval method, which is used for efficiently organizing and processing high-frequency real-time arriving space-time data streams based on multi-dimensional Feature Octree (MF-Octree) of time period (Temporal segment) and Octree (Octree), thereby ensuring that the latest arriving space-time data can be always inserted into the latest time segment and timely and reliable query results are returned to a user.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a spatio-temporal data fast retrieval method based on MF-Octree comprises the following steps:
step 1, establishing a time axis with time as a reference;
step 2, when newly arrived spatio-temporal data are received, storing the newly arrived spatio-temporal data in an octree based on a Z-order curve; the octree is sequentially positioned on the corresponding time period of the time shaft according to the time attribute of the stored space-time data;
step 3, when receiving the query request, finding out octree root nodes which accord with corresponding time attributes on a time axis according to the time attributes of the query request;
step 4, calculating the relevance ranking score of each node in the octree to which the root node belongs, and taking the node of which the relevance ranking score value is smaller than a preset relevance ranking score standard value as a query result and outputting the query result;
the relevancy ranking score of the node refers to the relevancy ranking score between the spatio-temporal data of the node and the query request.
According to the technical scheme, a novel index structure based on a time axis and an octree is constructed, and when a newly arrived space-time data stream exists, the high-frequency real-time arrived space-time data stream can be efficiently organized and processed according to the index structure; when receiving the query request, the search of the corresponding query request content in the massive spatiotemporal data can be efficiently completed according to the index structure, thereby meeting the real-time requirement of the user on the spatiotemporal data search, greatly reducing the query response time and effectively improving the user experience of the spatiotemporal data search system.
Further, the calculation formula of the relevancy ranking score is as follows:
f(q,oi)=ω1*fs(q,oi)+ω2*ft(q,oi)+ω3*fv(q,oi);
Figure GDA0003212532690000031
Man(q,oi)=|q-oi|,
Max(q,O)=max({Man(q,oi)|oi∈O});
Figure GDA0003212532690000032
Figure GDA0003212532690000033
in the formula, f (q, o)i) Spatio-temporal data o representing a node iiRank score of degree of association with query request q, fs() As a function of spatial similarity, ft() As a function of time similarity, fv() As a function of content similarity, ω1、ω2、ω3Weights representing spatial, temporal and content similarity, respectivelyCoefficient, ω1>0,ω2>0,ω3> 0 and omega12+ω 31 is ═ 1; o represents a set of spatiotemporal data objects to be queried; q.t and q.v represent the time value and content value, o, respectively, of the query request qiT and oiV represent spatio-temporal data o, respectivelyiTime value and content value.
According to the scheme, a functional relation between object data of related nodes and the query request in three dimensions of time, space and content is established according to the time value, the space value and the content value of the spatio-temporal data, and the functional relation is used as a calculation formula of the relevance ranking score of the nodes. Therefore, the similarity between each node and the query request can be measured by calculating the relevance ranking score of the node, and the higher the relevance ranking score value is, the closer the corresponding spatiotemporal data is to the query request is. Therefore, the relevance ranking score calculation formula constructed by the scheme can improve the accuracy of spatio-temporal data retrieval.
Further, a constraint model is preset before the step 3, when the query request is received in the step 3, the preset constraint model is used for preprocessing query data, and discrete nodes irrelevant to the content value of the query request are discarded;
the constraint model is a mixed model related to normal distribution and mean distribution, and specifically includes:
Figure GDA0003212532690000034
Si=f(q,oi);
in the formula, mu represents the row value and the column value of a spatio-temporal data object set O to be inquired; a discrete node that is independent of the content value of the query request means that its time-space data satisfies
Figure GDA0003212532690000035
Or
Figure GDA0003212532690000036
The node of (2).
According to the scheme, before the query request is retrieved, the spatio-temporal data to be retrieved is preprocessed, discrete nodes (octree nodes) irrelevant to the query request are discarded, the influence of irrelevant spatio-temporal data on the retrieval process is reduced, the retrieval amount can be reduced, and the retrieval speed is improved.
Further, the specific process of step 4 is:
step 4.1, setting the number of returned results of the query request to be l, initializing a query result R to be an empty set, initializing an intermediate heap H to be an empty heap, and presetting a standard value lambda of the relevancy ranking score to be: λ ═ λ0
Step 4.2, taking the octree root node obtained in the step 3, calculating the relevance ranking score of the root node, and storing the root node in a middle heap H;
4.3, taking out the node with the minimum relevance ranking score from the middle heap H;
when the extraction node is a leaf node: let the leaf node be e1, and the relevance ranking score of the leaf node e1 be Se1(ii) a Ranking the relevance scores S of the leaf nodese1Comparing with the standard value λ of the relevancy ranking score, if S is satisfiede1If the number of the leaf nodes e1 is less than or equal to lambda, popping the spatio-temporal data of the leaf nodes e1 into the query result R, and deleting the leaf nodes e1 in the middle heap H;
when the fetch node is a non-leaf node: let the non-leaf node be e2, calculate the relevance ranking score S of the non-leaf node be e2e2Selecting each child node e 'of the non-leaf node e 2'jJ-1, 2, …,8, calculating each child node e'jRank scores of degree of association of
Figure GDA0003212532690000041
All child nodes e'jRank scores of degree of association of
Figure GDA0003212532690000042
Comparing with the relevancy ranking score standard value lambda and meeting
Figure GDA0003212532690000043
All child nodes e'jAdding into an intermediate pile H;
and 4.4, outputting the query result R when the query result R contains l spatio-temporal data or the intermediate heap H is a blank heap, otherwise, returning to execute the step 4.3.
Furthermore, the time axis is segmented according to a preset time length T to set index segments, in the step 2, the number of the current index segments is set to be k, and if the time span T of newly arrived space-time data and the time span T of space-time data stored in the current index segment k are the same, the index segments are set to be k0And if the sum is greater than the preset duration T, storing the newly arrived spatio-temporal data in the (k + 1) th index segment.
Furthermore, all nodes of the octree are correspondingly provided with unique position codes, all the nodes exist in the hash mapping, and the corresponding nodes are allowed to be accessed based on any position codes; and all nodes of the octree are sorted according to the position codes, and the obtained sequence is a Z-order curve.
Further, each node of the octree includes 0 child nodes or 8 child nodes; if the node comprises 0 child nodes, the node is a leaf node of the octree; if the node comprises 8 child nodes, the node is a non-leaf node of the octree;
the octree adopts an inverted index mode, keywords of spatio-temporal data are represented at nodes of the octree, and a keyword list formed by the keywords of all the nodes of the octree is an inverted file;
the data file of the non-leaf node is expressed as < Oc, r, t, p >, wherein Oc represents a child node address, r represents a space region covered by the current non-leaf node, t represents a timestamp formed by reuniting the spatio-temporal data of the child node, p represents a pointer segment pointing to a keyword list, and the keywords in the keyword list correspond to the corresponding child nodes;
the data file of the leaf node is represented as < O, r, t, q >, wherein O represents the space-time data of the current leaf node, r represents the space area covered by the current leaf node, t represents the time stamp formed by the space-time data of the current leaf node, q represents a pointer segment pointing to an index key word list, and the key words in the key word list correspond to the corresponding nodes.
Advantageous effects
According to the technical scheme, a novel index structure based on a time axis and an octree is constructed, and when a newly arrived space-time data stream exists, the high-frequency real-time arrived space-time data stream can be efficiently organized and processed according to the index structure; when receiving the query request, the search of the corresponding query request content in the massive spatio-temporal data can be efficiently completed according to the index structure, thereby satisfying the real-time problem of the user on the data search, greatly reducing the query response time and effectively improving the user experience of the spatio-temporal data search system.
Drawings
FIG. 1 is a schematic diagram of the MF-Octree structure of the method of the present invention;
FIG. 2 is a schematic diagram of an octree structure according to the method of the present invention;
FIG. 3 is a schematic diagram of a Z-order curve generated based on octree according to the method of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail, which are developed based on the technical solutions of the present invention, and give detailed implementation manners and specific operation procedures to further explain the technical solutions of the present invention.
Since spatiotemporal data and queries flood our daily lives, in order to meet our needs, our index must meet the following main goals: first, the index must satisfy temporal, spatial, and retrieval content constraints; secondly, spatiotemporal data must be inserted in real time; finally, spatiotemporal data that is outdated or expected to be useless should be able to be filtered and deleted at a lower cost. Based on the above requirements, we propose an index structure supporting spatio-Temporal number retrieval and ordering, which is a multi-dimensional Feature Octree (MF-Octree) obtained based on time segments (Temporal segments) and Octree (Octree). As shown in fig. 1, the MF-Octree consists of a time axis and an Octree, and can effectively combine spatial proximity, temporal approximation, and content correlation to quickly and accurately retrieve spatiotemporal object data (hereinafter, collectively referred to as spatiotemporal data).
Because of the high real-time nature of spatiotemporal data, this requires the index structure to have the ability to insert and delete data quickly. To achieve this, we perform continuous disjoint segmentation on all spatio-temporal data to be queried in the time dimension to obtain an index segment. The advantage of this partitioning is that it ensures that the newly incoming spatiotemporal data is always inserted into the nearest index segment. If the time set for this index segment is exceeded, the index segment is terminated and a new null segment is generated to insert the new incoming spatiotemporal data. Assuming that the number of current index segments is k, each index segment only indexes T hours of spatio-temporal data, and the time range of newly-transmitted spatio-temporal data is T, the following conclusions can be obtained:
Figure GDA0003212532690000061
the invention sets index segments by segmenting the time axis of MF-Octree according to preset time length T, and each preset time length T is correspondingly provided with one index segment, namely each index segment can only store the time-space data with the time length T at most. If the time span t of the newly arrived space-time data and the time span t of the space-time data stored in the current index segment k0And if the sum is greater than the preset duration T, storing the newly arrived spatio-temporal data in the (k + 1) th index segment.
Octree is a recursive, axially aligned, spatially separated data structure, which is commonly used in computer geometry to optimize collision detection, nearest neighbor search, etc. An octree is subdivided into eight subunits by recursion until the size of the remaining space in each unit reaches a predetermined weight or the maximum depth of the tree. Each cell is subdivided by an axially aligned plane, similar to a spatial coordinate system, with its origin generally centered on the parent node. Therefore, each node may have 0 or 8 child nodes, as shown in fig. 2, which facilitates storage of sparsely distributed structures compared to normal grid structures. The node including 0 child nodes is a leaf node, and the node including 8 child nodes is a non-leaf node.
In order to greatly save memory and traversal overhead, a hash function is introduced to represent nodes of an octree. This form of expression does not store pointers to parent and child nodes, but rather stores a unique index (i.e., position code) on each node, and each node exists in a hash map and allows direct access to any node based on position code. Because the location code is easily derived from the parent and child nodes of any node by applying hash mapping, the node structure can determine whether a child node allocates memory space by a bit mask (bit mask is to set 0-1 encoding for each location) in order to avoid unnecessary hash mapping to find non-existent child nodes. All nodes of the octree are sorted according to the position codes, and the final sequence is consistent with the traversal sequence of the cotree, as shown in FIG. 3, namely 0000 is traversed by the child nodes of the octree firstly. This is equivalent to morton coding (known as Z-order curve on wikipedia) which linearly codes the multidimensional spatio-temporal data and preserves the position of the spatio-temporal data at multiple levels. The Z-order (i.e., the order of the nodes of the Z-order curve) may describe each node of the octree and its traversal order.
The octree adopts an inverted index mode, namely inverted octree. Keywords representing spatio-temporal data at octree nodes form a keyword list of all the nodes of the octree, which is called an inverted file.
The data file of the non-leaf node is expressed as < Oc, r, t, p >, wherein Oc represents a child node address, r represents a space region covered by the current non-leaf node, t represents a timestamp formed by reuniting the spatio-temporal data of the child node, p represents a pointer segment pointing to a keyword list, and the keywords in the keyword list correspond to the corresponding child nodes;
the data file of the leaf node is represented as<O,r,t,q*>Wherein, O represents the spatio-temporal data of the current leaf node, r represents the spatial region covered by the current leaf node, t represents the timestamp formed by the spatio-temporal data of the current leaf node, q represents the pointer segment pointing to the index key list, and the keys in the key list correspond to the corresponding nodes.In the spatiotemporal data retrieval problem, for a given query request q, we can set a constraint model to discard discrete nodes that are not related to the query content. Using a mixed model pair of f (q, o) with respect to normal and mean distributionsi) And (3) carrying out constraint to reduce the influence of irrelevant object data, wherein the constraint model specifically comprises the following steps:
Figure GDA0003212532690000071
S=f(q,oi)=ω1*fs(q,oi)+ω2*ft(q,oi)+ω3*fv(q,oi);
Man(q,oi)=|q-oi|,
Max(q,O)=max({Man(q,oi)|oi∈O});
Figure GDA0003212532690000072
Figure GDA0003212532690000073
in the formula, mu represents the row value and the column value of a spatio-temporal data object set O to be inquired; f (q, o)i) Spatio-temporal data o representing a node iiRank score of degree of association with query request q, fs() As a function of spatial similarity, ft() As a function of time similarity, fv() As a function of content similarity, ω1、ω2、ω3Weight coefficients, ω, representing spatial, temporal and content similarity, respectively1>0,ω2>0,ω3> 0 and omega12+ω 31 is ═ 1; o represents a set of spatiotemporal data objects to be queried; q.t and q.v represent the time value and content value, o, respectively, of the query request qiT and oiV represent spatio-temporal data o, respectivelyiTime value and content value.
If the spatio-temporal data oiSatisfies the following conditions:
Figure GDA0003212532690000081
or
Figure GDA0003212532690000082
Then the spatio-temporal data oiDeleted from the set of spatio-temporal data objects to be retrieved, i.e. considered as the spatio-temporal data oiThe located nodes are discrete nodes.
For the above constraint model
Figure GDA0003212532690000083
The derivation process is described below as:
first, the objective of the spatiotemporal data query algorithm is to find the data object with the highest f (q, o), and the objective of the present invention is to maximize the following objective function to obtain the object with the highest spatiotemporal content score.
Figure GDA0003212532690000084
We can transform the original problem into a minimization problem using negative logarithm:
Figure GDA0003212532690000085
in the invention, the distribution of the spatio-temporal data is assumed to be subject to normal distribution, because only some local characteristics of the data need to be described in the query process, the method has robustness on the discrete joint-free points. Therefore, we can use the following mixed model of normal distribution and mean distribution for optimization and filtering. Let us say f (q, o)i)=Si,oi={oi1,oi2,...,oin}。
Figure GDA0003212532690000086
Figure GDA0003212532690000087
By analyzing data attributes, we can use Gaussian functions
Figure GDA0003212532690000088
The form approximates the filter model. When o isi=0,oi=σ,oiInfinity, we can obtain djJ is 1,2,3, and is calculated as (c)1、c2Can be obtained by letting p (o)i) Calculated as 1, we do not discuss here further):
d1=-log(c1+c2)-d3
Figure GDA0003212532690000089
d3=-log(c2);
the degree of association of each final target point to the function of the temporal ordering value is (constant term d3 is omitted):
Figure GDA0003212532690000091
in the formula, mu represents a row value and a column value of a matrix formed by a spatio-temporal data object set O to be retrieved; in the above equation, we also need to compute the values of the feature matrix Σ of the spatio-temporal data. If the feature matrix Σ is approximately singular, we should perform a slight metric transformation to prevent numerical problems, that is: if the maximum eigenvalue λ1Ratio λ2Or λ3Is 100 times larger, and is used
Figure GDA0003212532690000092
Instead of a smaller eigenvalue lambdajAnd using ∑ Α' Α instead of the original feature matrix Σ, where a is the feature vector matrix of the feature matrix Σ,
Figure GDA0003212532690000093
example (b):
step 1, establishing a time axis with time as a reference;
step 2, when newly arrived spatio-temporal data are received, storing the newly arrived spatio-temporal data in an octree based on a Z-order curve; the octree is sequentially positioned on the corresponding time period of the time shaft according to the time attribute of the stored space-time data;
step 3, when receiving a query request, using a preset constraint model to perform query data preprocessing; then finding out octree root nodes which accord with corresponding time attributes on a time axis according to the time attributes of the query requests;
step 4, calculating the relevance ranking score of each node in the octree to which the root node belongs, and taking the node of which the relevance ranking score value is smaller than a preset relevance ranking score standard value as a query result and outputting the query result; the specific process is as follows:
step 4.1, setting the number of returned results of the query request to be l, initializing a query result R to be an empty set, initializing an intermediate heap H to be an empty heap, and presetting a standard value lambda of the relevancy ranking score to be: λ ═ infinity;
step 4.2, taking the octree root node obtained in the step 3, calculating the relevance ranking score of the root node, and storing the root node in a middle heap H;
4.3, taking out the node with the minimum relevance ranking score from the middle heap H;
when the extraction node is a leaf node: let the leaf node be e1, and the relevance ranking score of the leaf node e1 be Se1(ii) a Ranking the relevance scores S of the leaf nodese1Comparing with the standard value λ of the relevancy ranking score, if S is satisfiede1And if not, popping up the spatio-temporal data of the leaf node e1 into a query result R, and updating the standard value lambda of the relevancy ranking score to be: λ ═ Se1Delete leaf node e1 in middle heap H;
when the fetch node is a non-leaf node: setting the non-leafThe node is e2, the non-leaf node is e2, and the relevancy ranking score is Se2Selecting each child node e 'of the non-leaf node e 2'jJ-1, 2, …,8, calculating each child node e'jRank scores of degree of association of
Figure GDA0003212532690000101
All child nodes e'jRank scores of degree of association of
Figure GDA0003212532690000102
Comparing with the relevancy ranking score standard value lambda and meeting
Figure GDA0003212532690000103
All child nodes e'jAdding into an intermediate pile H;
and 4.4, outputting the query result R when the query result R contains l spatio-temporal data or the intermediate heap H is a blank heap, otherwise, returning to execute the step 4.3.
The above embodiments are preferred embodiments of the present application, and those skilled in the art can make various changes or modifications without departing from the general concept of the present application, and such changes or modifications should fall within the scope of the claims of the present application.

Claims (6)

1. A spatio-temporal data fast retrieval method based on MF-Octree is characterized by comprising the following steps:
step 1, establishing a time axis with time as a reference;
step 2, when newly arrived spatio-temporal data are received, storing the newly arrived spatio-temporal data in an octree based on a Z-order curve; the octree is sequentially positioned on the corresponding time period of the time shaft according to the time attribute of the stored space-time data;
step 3, when receiving the query request, finding out octree root nodes which accord with corresponding time attributes on a time axis according to the time attributes of the query request;
step 4, calculating the relevance ranking score of each node in the octree to which the root node belongs, and taking the node of which the relevance ranking score value is smaller than a preset relevance ranking score standard value as a query result and outputting the query result;
the relevancy ranking score of the node refers to the relevancy ranking score between the spatio-temporal data of the node and the query request;
the specific process of the step 4 is as follows:
step 4.1, setting the number of returned results of the query request to be l, initializing a query result R to be an empty set, initializing an intermediate heap H to be an empty heap, and presetting a standard value lambda of the relevancy ranking score to be: λ ═ λ0
Step 4.2, taking the octree root node obtained in the step 3, calculating the relevance ranking score of the root node, and storing the root node in a middle heap H;
4.3, taking out the node with the minimum relevance ranking score from the middle heap H;
when the extraction node is a leaf node: let the leaf node be e1, and the relevance ranking score of the leaf node e1 be Se1(ii) a Ranking the relevance scores S of the leaf nodese1Comparing with the standard value λ of the relevancy ranking score, if S is satisfiede1If the number of the leaf nodes e1 is less than or equal to lambda, popping the spatio-temporal data of the leaf nodes e1 into the query result R, and deleting the leaf nodes e1 in the middle heap H;
when the fetch node is a non-leaf node: let the non-leaf node be e2, calculate the relevance ranking score S of the non-leaf node be e2e2Selecting each child node e 'of the non-leaf node e 2'jJ-1, 2, …,8, calculating each child node e'jRank scores of degree of association of
Figure FDA0003212532680000011
All child nodes e'jRank scores of degree of association of
Figure FDA0003212532680000012
Comparing with the relevancy ranking score standard value lambda and meeting
Figure FDA0003212532680000013
All child nodes e'jAdding into an intermediate pile H;
and 4.4, outputting the query result R when the query result R contains l spatio-temporal data or the intermediate heap H is a blank heap, otherwise, returning to execute the step 4.3.
2. The method of claim 1, wherein the relevancy ranking score is calculated by the formula:
f(q,oi)=ω1*fs(q,oi)+ω2*ft(q,oi)+ω3*fv(q,oi);
Figure FDA0003212532680000021
Man(q,oi)=|q-oi|,
Max(q,O)=max({Man(q,oi)|oi∈O});
Figure FDA0003212532680000022
Figure FDA0003212532680000023
in the formula, f (q, o)i) Spatio-temporal data o representing a node iiRank score of degree of association with query request q, fs() As a function of spatial similarity, ft() As a function of time similarity, fv() As a function of content similarity, ω1、ω2、ω3Weight coefficients, ω, representing spatial, temporal and content similarity, respectively1>0,ω2>0,ω3> 0 and omega1231 is ═ 1; o represents a set of spatiotemporal data objects to be queried; q.t and q.v respectively representing query requests qTime value and content value, oiT and oiV represent spatio-temporal data o, respectivelyiTime value and content value.
3. The method according to claim 2, wherein a constraint model is preset before step 3, and when the query request is received in step 3, the preset constraint model is used for query data preprocessing, and discrete nodes irrelevant to the content value of the query request are discarded;
the constraint model is a mixed model related to normal distribution and mean distribution, and specifically includes:
Figure FDA0003212532680000024
Si=f(q,oi);
in the formula, mu represents the row value and the column value of a spatio-temporal data object set O to be inquired; a discrete node that is independent of the content value of the query request means that its time-space data satisfies
Figure FDA0003212532680000025
Or
Figure FDA0003212532680000026
The node of (2).
4. The method according to claim 1, wherein the time axis is segmented according to a preset time length T to set index segments, and in step 2, the number of current index segments is set to k, if the time span T of the newly arrived spatio-temporal data and the time span T of the spatio-temporal data stored in the current index segment k are the same, the time length T of the newly arrived spatio-temporal data is equal to the time length T of the spatio-temporal data stored in the current index segment k0And if the sum is greater than the preset duration T, storing the newly arrived spatio-temporal data in the (k + 1) th index segment.
5. The method of claim 1, wherein all nodes of the octree are correspondingly provided with unique position codes, all nodes exist in the hash map, and access to the corresponding nodes is allowed based on any position code; and all nodes of the octree are sorted according to the position codes, and the obtained sequence is a Z-order curve.
6. The method of claim 1, wherein each node of the octree comprises 0 children nodes or 8 children nodes; if the node comprises 0 child nodes, the node is a leaf node of the octree; if the node comprises 8 child nodes, the node is a non-leaf node of the octree;
the octree adopts an inverted index mode, keywords of spatio-temporal data are represented at nodes of the octree, and a keyword list formed by the keywords of all the nodes of the octree is an inverted file;
the data file of the non-leaf node is expressed as < Oc, r, t, p >, wherein Oc represents a child node address, r represents a space region covered by the current non-leaf node, t represents a timestamp formed by reuniting the spatio-temporal data of the child node, p represents a pointer segment pointing to a keyword list, and the keywords in the keyword list correspond to the corresponding child nodes;
the data file of the leaf node is represented as < O, r, t, q >, wherein O represents the space-time data of the current leaf node, r represents the space area covered by the current leaf node, t represents the time stamp formed by the space-time data of the current leaf node, q represents a pointer segment pointing to an index key word list, and the key words in the key word list correspond to the corresponding nodes.
CN201910576241.3A 2019-06-28 2019-06-28 MF-Octree-based spatio-temporal data rapid retrieval method Active CN110334290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910576241.3A CN110334290B (en) 2019-06-28 2019-06-28 MF-Octree-based spatio-temporal data rapid retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910576241.3A CN110334290B (en) 2019-06-28 2019-06-28 MF-Octree-based spatio-temporal data rapid retrieval method

Publications (2)

Publication Number Publication Date
CN110334290A CN110334290A (en) 2019-10-15
CN110334290B true CN110334290B (en) 2021-12-03

Family

ID=68143639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910576241.3A Active CN110334290B (en) 2019-06-28 2019-06-28 MF-Octree-based spatio-temporal data rapid retrieval method

Country Status (1)

Country Link
CN (1) CN110334290B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343050B (en) * 2021-05-25 2022-11-29 中南民族大学 Method and system for solving wyy-not problem based on time perception object
CN114385906A (en) * 2021-10-29 2022-04-22 北京达佳互联信息技术有限公司 Prediction method, recommendation method, device, equipment and storage medium
CN115809360B (en) * 2023-02-08 2023-05-05 深圳大学 Real-time space connection query method for large-scale space-time data and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070119853A (en) * 2006-06-16 2007-12-21 김효원 Real-time three-dimensional painting image rendering using octree texture
CN102521863A (en) * 2011-12-01 2012-06-27 武汉大学 Three-dimensional fluid scalar vector uniform dynamic showing method based on particle system
CN103440350A (en) * 2013-09-22 2013-12-11 吉林大学 Three-dimensional data search method and device based on octree
CN105261063A (en) * 2015-09-29 2016-01-20 北京三维易达科技有限公司 Three-dimensional particle system large scale sea climate simulation method based on octree
CN105426491A (en) * 2015-11-23 2016-03-23 武汉大学 Space-time geographic big data retrieval method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070119853A (en) * 2006-06-16 2007-12-21 김효원 Real-time three-dimensional painting image rendering using octree texture
CN102521863A (en) * 2011-12-01 2012-06-27 武汉大学 Three-dimensional fluid scalar vector uniform dynamic showing method based on particle system
CN103440350A (en) * 2013-09-22 2013-12-11 吉林大学 Three-dimensional data search method and device based on octree
CN105261063A (en) * 2015-09-29 2016-01-20 北京三维易达科技有限公司 Three-dimensional particle system large scale sea climate simulation method based on octree
CN105426491A (en) * 2015-11-23 2016-03-23 武汉大学 Space-time geographic big data retrieval method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A hybrid index model for efficient spatio-temporal search in HBase;Zhangy C;《Trends and Applications in Knowledge Discovery and Data Mining》;20180519;全文 *
Efficient Interactive Search for Geo-tagged Multimedia Data;Long J;《Multimedia Tools & Applications》;20181231;全文 *
高效八叉树octree:基于hash函数的数据结构;战战兢兢小码农;《战战兢兢小码农》;20180521;正文第1-4页 *

Also Published As

Publication number Publication date
CN110334290A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
WO2017012491A1 (en) Similarity comparison method and apparatus for high-dimensional image features
Wei et al. Analyticdb-v: A hybrid analytical engine towards query fusion for structured and unstructured data
US8548969B2 (en) System and method for clustering content according to similarity
CN104991959B (en) A kind of method and system of the same or similar image of information retrieval based on contents
CN110334290B (en) MF-Octree-based spatio-temporal data rapid retrieval method
CN104424258B (en) Multidimensional data query method, query server, column storage server and system
CN106503223B (en) online house source searching method and device combining position and keyword information
CN109166615B (en) Medical CT image storage and retrieval method based on random forest hash
CN104834693A (en) Depth-search-based visual image searching method and system thereof
CN108763295B (en) Video approximate copy retrieval algorithm based on deep learning
CN107145519B (en) Image retrieval and annotation method based on hypergraph
CN106874425B (en) Storm-based real-time keyword approximate search algorithm
CN109408578A (en) One kind being directed to isomerous environment monitoring data fusion method
US11567952B2 (en) Systems and methods for accelerating exploratory statistical analysis
Tang et al. Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce.
Gao et al. Real-time social media retrieval with spatial, temporal and social constraints
US20230315727A1 (en) Cost-based query optimization for untyped fields in database systems
CN104933143A (en) Method and device for acquiring recommended object
Tian et al. A survey of spatio-temporal big data indexing methods in distributed environment
CN109446293B (en) Parallel high-dimensional neighbor query method
Abbasifard et al. Efficient indexing for past and current position of moving objects on road networks
Dam et al. Efficient top-k recently-frequent term querying over spatio-temporal textual streams
Gothwal et al. The survey on skyline query processing for data-specific applications
CN107291875B (en) Metadata organization management method and system based on metadata graph
CN113448994B (en) Continuous regrettage minimization query method based on core set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant