CN110347676B

CN110347676B - Uncertainty tense data management and query method based on relation R tree

Info

Publication number: CN110347676B
Application number: CN201910504660.6A
Authority: CN
Inventors: 许建秋; 韦建华
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2021-07-27
Anticipated expiration: 2039-06-11
Also published as: CN110347676A

Abstract

The invention discloses an uncertainty tense data management and query method based on a relation R tree, which is applied to the field of databases and realizes management and query of uncertainty tense data in an extensible mobile object database SECONDO. The method manages a large number of given intervals with uncertain starting points and ending points but determined lengths, builds a relation R tree for the intervals, manages uncertain temporal data, manages the uncertainty of time attributes and weight attributes, combines the relation R tree, and can index the influence of query results according to the weight in the query process so as to improve the query efficiency, calculate the intersection probability and finally return k data with the largest weight which is possibly intersected with the query data.

Description

Uncertainty tense data management and query method based on relation R tree

Technical Field

The invention belongs to a data processing technology, and particularly relates to an uncertain temporal data management and query method based on a relation R tree.

Background

As applications evolve, the storage of data becomes more complex, not just some deterministic data, but also some non-deterministic data, such as in project planning where the expected completion date is often loosely defined, e.g., "a project will complete within three to six months thereafter". Some temporal variables are described as uncertain temporal information, so that the generated data is closer to the intuition of human knowledge and is more consistent with the real world situation. How to manage these uncertain temporal data is a crucial part of how we can not utilize these data efficiently.

For uncertain temporal data, the data consists of an uncertain starting point and an uncertain ending point: the expression form is as follows:

<<x1,x2>,<y1,y2>>。

therefore, when the temporal data index is established, the spatial index algorithm is mostly used to map the < x1, x2>, < y1, and y2> into 4 fixed points of a spatial rectangle for processing. Currently, the most common is to combine R-tree techniques with uncertain temporal indexing. However, since the spatial mode represents valid time and has an invalid range, the result is influenced to a certain extent during query, and only time attributes are managed and weights are not managed, an uncertain temporal data management and query method based on a relation R tree is provided, and the weights are managed while uncertain temporal data are managed.

Disclosure of Invention

The purpose of the invention is as follows: in order to eliminate the influence of invalid areas when uncertain temporal data are managed, the invention aims to provide an uncertain temporal data management and query method based on a relation R tree.

The technical scheme is as follows: a uncertain temporal data management and query method based on a relation R tree is characterized in that firstly, temporal data with uncertain starting points and ending points are managed based on determined interval lengths, then the relation R tree is constructed, and query is carried out according to the weight sequence of the temporal data; the method comprises the following steps:

(1) generating uncertain temporal data and establishing a relation R tree: giving interval parameters, generating original uncertain temporal data intervals, and then building rectangular frames of the data intervals according to time attributes and weights into an R tree;

(2) relationship management of the relationship R Tree: traversing the R tree constructed in the step (1) to obtain the relation of the item indexes ordered from big to small according to the weight in the current node, and storing by combining an auxiliary structure;

(3) querying uncertain temporal data top-k: and (3) taking the relation R tree obtained in the step (2) and the value to be inquired as input, comparing the relation R tree with each node from the root point, and when the child node of the current node is to be accessed, selecting the next access target according to the relation between the node nodeid stored in the relation R tree and the weight value.

Further, the step (1) includes expanding the weight up and down according to the interval range and the weight of the uncertain interval data, taking the range as an x axis and the weight as a y axis, constructing the uncertain interval data into a rectangular frame, and then constructing an R tree according to the obtained rectangular frame.

Furthermore, when the weight of the uncertain interval data is expanded up and down, a minimum value which does not influence the query result is selected for expansion calculation.

Further, the auxiliary structures in the step (2) are B-tree and arrays;

in the step (2), acquiring and managing the relationship between a current node and the index number of the entry in the current node according to the constructed traversal R tree so as to access the nodes in the descending order of the weight of the entry in the current node, wherein the weight and the index number are stored in the relationship; when a B-tree is used for managing the relationship, a tree is built by taking nodeid in a relationship table obtained by traversing the R tree as a keyword, the position of the node in the B-tree is found according to the id number of the current R-tree node, and the index number and the weight of a child node corresponding to the current node are obtained; when the relation is managed by an array, an array with the same size as the number of the nodes of the R-tree is created, the nodeid of the node is used as an array subscript, the tuple in the relation is used as array content for mapping, and the mapping rule is that the nodeid of the node corresponds to the access sequence of the node when the R-tree is traversed.

Furthermore, when accessing the nodes in the R tree in step (3), the nodes with large uncertain data weight are accessed preferentially.

Preferably, the auxiliary structures in the step (2) are B-tree and array; in step (2), a relationship between a current node and an entry index number in the current node is obtained and managed according to a constructed traversal R tree, the nodes are visited in the order of the weights of the entries in the current node from large to small, and the relationship is expressed as follows:

rel(tuple(int:tupleid,int:nodeid,list:entries))，list＝<(w1,index1),……,wn,indexn)>；

wherein, the node of the B-tree is marked as n (nodeid, L), L ═ < (w1, index1), … …, (wn, index) >;

based on the array management relationship, an array with the same size as the number of the nodes of the R-tree is created, the nodeid of the node is used as an array subscript, the tuple in the relationship is used for mapping the content of the array, and the mapping rule is that the nodeid of the node corresponds to the access sequence when the R-tree is traversed, and is expressed as follows:

R-Array[nodeid]＝tupleid。

and (4) preferentially accessing the nodes in the R tree according to the nodes with large uncertain data weight values when accessing the nodes in the R tree in the step (3).

The uncertain temporal data management and query method based on the relation R tree is realized respectively, firstly, uncertain temporal data are constructed into a rectangular frame according to time attributes and weights, so that an R tree is constructed, then the relation of item indexes which are ordered from large to small according to the weights in the current nodes is obtained by traversing the R tree, and the relation is stored by combining an auxiliary structure (B-tree and an array), so that the nodes can be accessed from large to small according to the weights when the given data and the uncertain temporal data are queried to be crossed.

Has the advantages that: compared with the prior art, the uncertainty tense data management and query method based on the relation R tree manages the weight attribute of the data besides the uncertainty of the data, and eliminates an invalid area; when the top-k query is made, for the child nodes of the node which is accessed currently, the child nodes can be accessed from large to small according to the weight value of the data, and the probability of the intersection of the query data and the current data can be calculated.

Drawings

FIG. 1 is a data representation of uncertain temporal data according to the present invention;

FIG. 2 is a two-dimensional representation of embodiment uncertainty temporal data;

FIG. 3 illustrates three cases where two interval data intersect in the example;

FIG. 4 is a diagram of a basic structure of a relational R-tree according to the present invention;

FIG. 5 is a relational diagram of a relational R-tree in accordance with the present invention that requires maintenance;

FIG. 6 is a diagram of an embodiment of maintaining relationships using a B-tree;

the relationship graph is maintained using an array in the embodiment of FIG. 7.

Detailed Description

For the purpose of explaining the technical solution disclosed in the present invention in detail, the following description is further made with reference to the accompanying drawings and specific embodiments.

The invention discloses an uncertain temporal data management and query method based on a relation R tree, which is used for realizing the management of uncertain temporal data in an extensible mobile object database SECONDO. The method comprises the steps of firstly constructing uncertain temporal data into a rectangular frame according to time attributes and weights to construct an R tree, then traversing the R tree to obtain the relation of item indexes which are ordered from large to small according to the weights in current nodes, and storing the relation by combining an auxiliary structure (B-tree and an array), so that the nodes can be accessed from large to small according to the weights when given data and the uncertain temporal data are inquired to be intersected.

(1) Generating uncertain temporal data and building a tree;

the invention considers the given data space condition, and needs to generate uncertain temporal data in advance for experimental and practical requirements. We represent temporal data by intervals, which are automatically generated by the system, but some interval parameters are given:

1) interval minimum and maximum, this is in order to limit the interval value to be controllable, in this invention, we stipulate that the interval minimum is 1, the maximum is 100000;

2) the interval length and the interval weight, the length is used for stipulating the exact length of each interval, and the stipulating interval length in the invention is a random value in the interval range; 3) the number of intervals, which is defined in this experiment, is 2000000. The above values are only required for experiments, and can be adjusted at any time according to experimental conditions. And then establishing an R tree according to the generated temporal data.

(2) Relationship management of the relationship R tree;

for the generated R tree, traversing and recording the nodeid of each node and the id and the weight of its entries from the root node, so that after traversing the R tree, a relationship between the current node and the index of the entry in the current node sorted from large to small according to the weight is obtained. In order to manage the relationship, two methods are proposed, one is to use a B tree for management, and a nodeid in a relationship table obtained by traversing an R tree is used as a key word to build the tree; one is to combine array management to create an array with the same size as the number of the nodes of the R-tree, and map one by taking the nodeid of the node as the array subscript and the tupleid in the relationship as the array content.

(3) Querying uncertain temporal data top-k;

for the given uncertain temporal data, when any query interval is given, k intervals with the largest weight value and most possibly intersected with the interval are searched. The method comprises the steps of searching according to a built relation R tree, firstly, judging whether an interval to be inquired is intersected with a root node or not from the root node, and if the interval to be inquired is intersected with the root node, continuously accessing child nodes of the R tree, and at the moment, accessing according to the node index sequence in the R tree, but accessing according to the weight value in the sequence from large to small through the child nodes stored in a relation table.

Specifically, the uncertainty temporal data management and query method based on the relation R tree can manage time attributes and weight values, and index the influence of the weight values on query results, so that the relation of the weight values among nodes needs to be known during query, and the nodeid of one node and the relation between the id and the weight value of entries of the node are managed by combining the relation R tree. The method mainly comprises the following steps:

(1) generating uncertain temporal data and building a tree;

in the invention, a series of uncertain temporal data meeting requirements need to be constructed, and the constructed temporal data is stored in an extensible database system SECONDO, for the convenience of experiments, some basic parameters of the data are set, and the types and meanings of the parameters are introduced in detail before.

Fig. 1 is a representation of a series of generated uncertain temporal data, a rectangular box is constructed as shown in fig. 2, a dotted line represents a movable range of uncertain data, a y-axis represents a weight, the weight is expanded up and down by a minimum range to expand a one-dimensional line segment into a two-dimensional rectangular box, and then the rectangular boxes are constructed into an R tree as shown in fig. 4.

(2) Relationship management of the relationship R tree;

for the generated R tree, we traverse and record the nodeid of each node and the id and weight of its entries from the root node, so that after traversing the R tree, we obtain the relationship between the current node and the index of the entry in the current node sorted by weight from large to small, as shown in fig. 5. In order to manage the relationship, we propose two methods, one is B-tree management, we use nodeid in the relationship table obtained by traversing R tree as key to build the tree, and can find its position in B-tree according to the id number of the node of the current R tree and obtain the index number and weight of the child node corresponding to the current node, as shown in fig. 6. One is to combine array management to create an array with the same size as the number of nodes in the R tree, take the nodeid of a node as an array subscript, and in the relationship, tupleid is the array content for mapping, and the mapping rule is to make the nodeid of a node correspond to the access sequence when traversing the R tree, as shown in fig. 7.

(3) Querying uncertain temporal data Top-k;

another important purpose of using the relational R tree is to find k intervals with the largest weight that most probably intersect the query interval for fast query. The invention provides an intersection probability calculation method of uncertain temporal data and a weight priority search method. The intersection of two interval data is divided into four cases, and as shown in fig. 3, the probability function of the intersection of the two interval data can be expressed as:

wherein (a, b) is a query interval, (s, e) is an uncertain interval, L is an uncertain data range, and L is an uncertain data length. When inquiring, the user starts to visit from the root node, judges whether the node is intersected with the interval to be inquired, if the node is intersected with the interval to be inquired, the user continues to visit the child nodes, at the moment, the user does not visit according to the node index sequence in the R tree, but visits according to the sequence of the weight values from large to small through the child nodes stored in the relational table. For the node with Nodeid x in FIG. 4, the access order of its children nodes is E for the standard R tree₁，E₂Based on the relation R tree, the relation between the node index number and the weight value is managed, and at the moment, the nodes can be accessed according to the information stored in the B tree or the array constructed in the step (2) from the weight value to the weight value, namely E₂，E₁。

Claims

1. An uncertainty tense data management and query method based on a relation R tree is characterized in that: firstly, managing uncertain temporal data of a starting point and an ending point based on a determined interval length, then constructing a relation R tree, and inquiring according to the weight sequence of the temporal data; the method comprises the following steps:

(1) generating uncertain temporal data and establishing a relation R tree: giving interval parameters, generating original uncertain temporal data intervals, and then building rectangular frames of the data intervals according to time attributes and weights into an R tree, wherein the step (1) specifically comprises the following steps:

(1.1) according to the interval range and the weight of the uncertain interval data, taking the range as an x axis and the weight as a y axis, expanding the weight up and down, constructing the uncertain interval data into a rectangular frame, and then constructing an R tree according to the obtained rectangular frame;

(1.2) expanding the weight of the uncertain interval data up and down to be calculated according to the minimum value of the weight;

2. The uncertainty temporal data management and query method based on the relational R-tree as claimed in claim 1, wherein: and (3) the auxiliary structures in the step (2) are B-tree and arrays.

3. The uncertainty temporal data management and query method based on the relational R-tree as claimed in claim 1, wherein: in the step (2), acquiring and managing the relationship between a current node and the index number of the entry in the current node according to the constructed R tree, and visiting the nodes in the descending order of the weight of the entry in the current node, wherein the weight and the index number are stored in the relationship;

when a B-tree is used for managing the relationship, establishing the tree by taking nodeid in a relationship table obtained by traversing the R tree as a keyword, finding the position of the node in the B-tree according to the id number of the current R-tree node, and obtaining the index number and the weight of a child node corresponding to the current node;

when the relation is managed by an array, an array with the same size as the number of the nodes of the R-tree is created, the nodeid of the node is used as an array subscript, the tuple in the relation is used as array content for mapping, and the mapping rule is that the nodeid of the node corresponds to the access sequence of the node when the R-tree is traversed.

4. The uncertainty temporal data management and query method based on the relational R-tree as claimed in claim 1, wherein: and (4) preferentially accessing the nodes in the R tree according to the nodes with large uncertain data weight values when accessing the nodes in the R tree in the step (3).