CN110347676B - Uncertainty tense data management and query method based on relation R tree - Google Patents

Uncertainty tense data management and query method based on relation R tree Download PDF

Info

Publication number
CN110347676B
CN110347676B CN201910504660.6A CN201910504660A CN110347676B CN 110347676 B CN110347676 B CN 110347676B CN 201910504660 A CN201910504660 A CN 201910504660A CN 110347676 B CN110347676 B CN 110347676B
Authority
CN
China
Prior art keywords
tree
relation
weight
node
uncertain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910504660.6A
Other languages
Chinese (zh)
Other versions
CN110347676A (en
Inventor
许建秋
韦建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201910504660.6A priority Critical patent/CN110347676B/en
Publication of CN110347676A publication Critical patent/CN110347676A/en
Application granted granted Critical
Publication of CN110347676B publication Critical patent/CN110347676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an uncertainty tense data management and query method based on a relation R tree, which is applied to the field of databases and realizes management and query of uncertainty tense data in an extensible mobile object database SECONDO. The method manages a large number of given intervals with uncertain starting points and ending points but determined lengths, builds a relation R tree for the intervals, manages uncertain temporal data, manages the uncertainty of time attributes and weight attributes, combines the relation R tree, and can index the influence of query results according to the weight in the query process so as to improve the query efficiency, calculate the intersection probability and finally return k data with the largest weight which is possibly intersected with the query data.

Description

Uncertainty tense data management and query method based on relation R tree
Technical Field
The invention belongs to a data processing technology, and particularly relates to an uncertain temporal data management and query method based on a relation R tree.
Background
As applications evolve, the storage of data becomes more complex, not just some deterministic data, but also some non-deterministic data, such as in project planning where the expected completion date is often loosely defined, e.g., "a project will complete within three to six months thereafter". Some temporal variables are described as uncertain temporal information, so that the generated data is closer to the intuition of human knowledge and is more consistent with the real world situation. How to manage these uncertain temporal data is a crucial part of how we can not utilize these data efficiently.
For uncertain temporal data, the data consists of an uncertain starting point and an uncertain ending point: the expression form is as follows:
<<x1,x2>,<y1,y2>>。
therefore, when the temporal data index is established, the spatial index algorithm is mostly used to map the < x1, x2>, < y1, and y2> into 4 fixed points of a spatial rectangle for processing. Currently, the most common is to combine R-tree techniques with uncertain temporal indexing. However, since the spatial mode represents valid time and has an invalid range, the result is influenced to a certain extent during query, and only time attributes are managed and weights are not managed, an uncertain temporal data management and query method based on a relation R tree is provided, and the weights are managed while uncertain temporal data are managed.
Disclosure of Invention
The purpose of the invention is as follows: in order to eliminate the influence of invalid areas when uncertain temporal data are managed, the invention aims to provide an uncertain temporal data management and query method based on a relation R tree.
The technical scheme is as follows: a uncertain temporal data management and query method based on a relation R tree is characterized in that firstly, temporal data with uncertain starting points and ending points are managed based on determined interval lengths, then the relation R tree is constructed, and query is carried out according to the weight sequence of the temporal data; the method comprises the following steps:
(1) generating uncertain temporal data and establishing a relation R tree: giving interval parameters, generating original uncertain temporal data intervals, and then building rectangular frames of the data intervals according to time attributes and weights into an R tree;
(2) relationship management of the relationship R Tree: traversing the R tree constructed in the step (1) to obtain the relation of the item indexes ordered from big to small according to the weight in the current node, and storing by combining an auxiliary structure;
(3) querying uncertain temporal data top-k: and (3) taking the relation R tree obtained in the step (2) and the value to be inquired as input, comparing the relation R tree with each node from the root point, and when the child node of the current node is to be accessed, selecting the next access target according to the relation between the node nodeid stored in the relation R tree and the weight value.
Further, the step (1) includes expanding the weight up and down according to the interval range and the weight of the uncertain interval data, taking the range as an x axis and the weight as a y axis, constructing the uncertain interval data into a rectangular frame, and then constructing an R tree according to the obtained rectangular frame.
Furthermore, when the weight of the uncertain interval data is expanded up and down, a minimum value which does not influence the query result is selected for expansion calculation.
Further, the auxiliary structures in the step (2) are B-tree and arrays;
in the step (2), acquiring and managing the relationship between a current node and the index number of the entry in the current node according to the constructed traversal R tree so as to access the nodes in the descending order of the weight of the entry in the current node, wherein the weight and the index number are stored in the relationship; when a B-tree is used for managing the relationship, a tree is built by taking nodeid in a relationship table obtained by traversing the R tree as a keyword, the position of the node in the B-tree is found according to the id number of the current R-tree node, and the index number and the weight of a child node corresponding to the current node are obtained; when the relation is managed by an array, an array with the same size as the number of the nodes of the R-tree is created, the nodeid of the node is used as an array subscript, the tuple in the relation is used as array content for mapping, and the mapping rule is that the nodeid of the node corresponds to the access sequence of the node when the R-tree is traversed.
Furthermore, when accessing the nodes in the R tree in step (3), the nodes with large uncertain data weight are accessed preferentially.
Preferably, the auxiliary structures in the step (2) are B-tree and array; in step (2), a relationship between a current node and an entry index number in the current node is obtained and managed according to a constructed traversal R tree, the nodes are visited in the order of the weights of the entries in the current node from large to small, and the relationship is expressed as follows:
rel(tuple(int:tupleid,int:nodeid,list:entries)),list=<(w1,index1),……,wn,indexn)>;
wherein, the node of the B-tree is marked as n (nodeid, L), L ═ < (w1, index1), … …, (wn, index) >;
based on the array management relationship, an array with the same size as the number of the nodes of the R-tree is created, the nodeid of the node is used as an array subscript, the tuple in the relationship is used for mapping the content of the array, and the mapping rule is that the nodeid of the node corresponds to the access sequence when the R-tree is traversed, and is expressed as follows:
R-Array[nodeid]=tupleid。
and (4) preferentially accessing the nodes in the R tree according to the nodes with large uncertain data weight values when accessing the nodes in the R tree in the step (3).
The uncertain temporal data management and query method based on the relation R tree is realized respectively, firstly, uncertain temporal data are constructed into a rectangular frame according to time attributes and weights, so that an R tree is constructed, then the relation of item indexes which are ordered from large to small according to the weights in the current nodes is obtained by traversing the R tree, and the relation is stored by combining an auxiliary structure (B-tree and an array), so that the nodes can be accessed from large to small according to the weights when the given data and the uncertain temporal data are queried to be crossed.
Has the advantages that: compared with the prior art, the uncertainty tense data management and query method based on the relation R tree manages the weight attribute of the data besides the uncertainty of the data, and eliminates an invalid area; when the top-k query is made, for the child nodes of the node which is accessed currently, the child nodes can be accessed from large to small according to the weight value of the data, and the probability of the intersection of the query data and the current data can be calculated.
Drawings
FIG. 1 is a data representation of uncertain temporal data according to the present invention;
FIG. 2 is a two-dimensional representation of embodiment uncertainty temporal data;
FIG. 3 illustrates three cases where two interval data intersect in the example;
FIG. 4 is a diagram of a basic structure of a relational R-tree according to the present invention;
FIG. 5 is a relational diagram of a relational R-tree in accordance with the present invention that requires maintenance;
FIG. 6 is a diagram of an embodiment of maintaining relationships using a B-tree;
the relationship graph is maintained using an array in the embodiment of FIG. 7.
Detailed Description
For the purpose of explaining the technical solution disclosed in the present invention in detail, the following description is further made with reference to the accompanying drawings and specific embodiments.
The invention discloses an uncertain temporal data management and query method based on a relation R tree, which is used for realizing the management of uncertain temporal data in an extensible mobile object database SECONDO. The method comprises the steps of firstly constructing uncertain temporal data into a rectangular frame according to time attributes and weights to construct an R tree, then traversing the R tree to obtain the relation of item indexes which are ordered from large to small according to the weights in current nodes, and storing the relation by combining an auxiliary structure (B-tree and an array), so that the nodes can be accessed from large to small according to the weights when given data and the uncertain temporal data are inquired to be intersected.
(1) Generating uncertain temporal data and building a tree;
the invention considers the given data space condition, and needs to generate uncertain temporal data in advance for experimental and practical requirements. We represent temporal data by intervals, which are automatically generated by the system, but some interval parameters are given:
1) interval minimum and maximum, this is in order to limit the interval value to be controllable, in this invention, we stipulate that the interval minimum is 1, the maximum is 100000;
2) the interval length and the interval weight, the length is used for stipulating the exact length of each interval, and the stipulating interval length in the invention is a random value in the interval range; 3) the number of intervals, which is defined in this experiment, is 2000000. The above values are only required for experiments, and can be adjusted at any time according to experimental conditions. And then establishing an R tree according to the generated temporal data.
(2) Relationship management of the relationship R tree;
for the generated R tree, traversing and recording the nodeid of each node and the id and the weight of its entries from the root node, so that after traversing the R tree, a relationship between the current node and the index of the entry in the current node sorted from large to small according to the weight is obtained. In order to manage the relationship, two methods are proposed, one is to use a B tree for management, and a nodeid in a relationship table obtained by traversing an R tree is used as a key word to build the tree; one is to combine array management to create an array with the same size as the number of the nodes of the R-tree, and map one by taking the nodeid of the node as the array subscript and the tupleid in the relationship as the array content.
(3) Querying uncertain temporal data top-k;
for the given uncertain temporal data, when any query interval is given, k intervals with the largest weight value and most possibly intersected with the interval are searched. The method comprises the steps of searching according to a built relation R tree, firstly, judging whether an interval to be inquired is intersected with a root node or not from the root node, and if the interval to be inquired is intersected with the root node, continuously accessing child nodes of the R tree, and at the moment, accessing according to the node index sequence in the R tree, but accessing according to the weight value in the sequence from large to small through the child nodes stored in a relation table.
Specifically, the uncertainty temporal data management and query method based on the relation R tree can manage time attributes and weight values, and index the influence of the weight values on query results, so that the relation of the weight values among nodes needs to be known during query, and the nodeid of one node and the relation between the id and the weight value of entries of the node are managed by combining the relation R tree. The method mainly comprises the following steps:
(1) generating uncertain temporal data and building a tree;
in the invention, a series of uncertain temporal data meeting requirements need to be constructed, and the constructed temporal data is stored in an extensible database system SECONDO, for the convenience of experiments, some basic parameters of the data are set, and the types and meanings of the parameters are introduced in detail before.
Fig. 1 is a representation of a series of generated uncertain temporal data, a rectangular box is constructed as shown in fig. 2, a dotted line represents a movable range of uncertain data, a y-axis represents a weight, the weight is expanded up and down by a minimum range to expand a one-dimensional line segment into a two-dimensional rectangular box, and then the rectangular boxes are constructed into an R tree as shown in fig. 4.
(2) Relationship management of the relationship R tree;
for the generated R tree, we traverse and record the nodeid of each node and the id and weight of its entries from the root node, so that after traversing the R tree, we obtain the relationship between the current node and the index of the entry in the current node sorted by weight from large to small, as shown in fig. 5. In order to manage the relationship, we propose two methods, one is B-tree management, we use nodeid in the relationship table obtained by traversing R tree as key to build the tree, and can find its position in B-tree according to the id number of the node of the current R tree and obtain the index number and weight of the child node corresponding to the current node, as shown in fig. 6. One is to combine array management to create an array with the same size as the number of nodes in the R tree, take the nodeid of a node as an array subscript, and in the relationship, tupleid is the array content for mapping, and the mapping rule is to make the nodeid of a node correspond to the access sequence when traversing the R tree, as shown in fig. 7.
(3) Querying uncertain temporal data Top-k;
another important purpose of using the relational R tree is to find k intervals with the largest weight that most probably intersect the query interval for fast query. The invention provides an intersection probability calculation method of uncertain temporal data and a weight priority search method. The intersection of two interval data is divided into four cases, and as shown in fig. 3, the probability function of the intersection of the two interval data can be expressed as:
Figure BDA0002090229080000051
wherein (a, b) is a query interval, (s, e) is an uncertain interval, L is an uncertain data range, and L is an uncertain data length. When inquiring, the user starts to visit from the root node, judges whether the node is intersected with the interval to be inquired, if the node is intersected with the interval to be inquired, the user continues to visit the child nodes, at the moment, the user does not visit according to the node index sequence in the R tree, but visits according to the sequence of the weight values from large to small through the child nodes stored in the relational table. For the node with Nodeid x in FIG. 4, the access order of its children nodes is E for the standard R tree1,E2Based on the relation R tree, the relation between the node index number and the weight value is managed, and at the moment, the nodes can be accessed according to the information stored in the B tree or the array constructed in the step (2) from the weight value to the weight value, namely E2,E1

Claims (4)

1. An uncertainty tense data management and query method based on a relation R tree is characterized in that: firstly, managing uncertain temporal data of a starting point and an ending point based on a determined interval length, then constructing a relation R tree, and inquiring according to the weight sequence of the temporal data; the method comprises the following steps:
(1) generating uncertain temporal data and establishing a relation R tree: giving interval parameters, generating original uncertain temporal data intervals, and then building rectangular frames of the data intervals according to time attributes and weights into an R tree, wherein the step (1) specifically comprises the following steps:
(1.1) according to the interval range and the weight of the uncertain interval data, taking the range as an x axis and the weight as a y axis, expanding the weight up and down, constructing the uncertain interval data into a rectangular frame, and then constructing an R tree according to the obtained rectangular frame;
(1.2) expanding the weight of the uncertain interval data up and down to be calculated according to the minimum value of the weight;
(2) relationship management of the relationship R Tree: traversing the R tree constructed in the step (1) to obtain the relation of the item indexes ordered from big to small according to the weight in the current node, and storing by combining an auxiliary structure;
(3) querying uncertain temporal data top-k: and (3) taking the relation R tree obtained in the step (2) and the value to be inquired as input, comparing the relation R tree with each node from the root point, and when the child node of the current node is to be accessed, selecting the next access target according to the relation between the node nodeid stored in the relation R tree and the weight value.
2. The uncertainty temporal data management and query method based on the relational R-tree as claimed in claim 1, wherein: and (3) the auxiliary structures in the step (2) are B-tree and arrays.
3. The uncertainty temporal data management and query method based on the relational R-tree as claimed in claim 1, wherein: in the step (2), acquiring and managing the relationship between a current node and the index number of the entry in the current node according to the constructed R tree, and visiting the nodes in the descending order of the weight of the entry in the current node, wherein the weight and the index number are stored in the relationship;
when a B-tree is used for managing the relationship, establishing the tree by taking nodeid in a relationship table obtained by traversing the R tree as a keyword, finding the position of the node in the B-tree according to the id number of the current R-tree node, and obtaining the index number and the weight of a child node corresponding to the current node;
when the relation is managed by an array, an array with the same size as the number of the nodes of the R-tree is created, the nodeid of the node is used as an array subscript, the tuple in the relation is used as array content for mapping, and the mapping rule is that the nodeid of the node corresponds to the access sequence of the node when the R-tree is traversed.
4. The uncertainty temporal data management and query method based on the relational R-tree as claimed in claim 1, wherein: and (4) preferentially accessing the nodes in the R tree according to the nodes with large uncertain data weight values when accessing the nodes in the R tree in the step (3).
CN201910504660.6A 2019-06-11 2019-06-11 Uncertainty tense data management and query method based on relation R tree Active CN110347676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910504660.6A CN110347676B (en) 2019-06-11 2019-06-11 Uncertainty tense data management and query method based on relation R tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910504660.6A CN110347676B (en) 2019-06-11 2019-06-11 Uncertainty tense data management and query method based on relation R tree

Publications (2)

Publication Number Publication Date
CN110347676A CN110347676A (en) 2019-10-18
CN110347676B true CN110347676B (en) 2021-07-27

Family

ID=68181813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910504660.6A Active CN110347676B (en) 2019-06-11 2019-06-11 Uncertainty tense data management and query method based on relation R tree

Country Status (1)

Country Link
CN (1) CN110347676B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723093A (en) * 2020-06-17 2020-09-29 江苏海平面数据科技有限公司 Uncertain interval data query method based on data division
CN115098616B (en) * 2022-07-25 2022-12-02 北京国科恒通科技股份有限公司 Multi-temporal spatial data storage and query methods, devices and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810118A (en) * 2012-07-05 2012-12-05 上海电力学院 K nearest neighbor search method for variable weight network
CN103455531A (en) * 2013-02-01 2013-12-18 深圳信息职业技术学院 Parallel indexing method supporting real-time biased query of high dimensional data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100036865A1 (en) * 2008-08-07 2010-02-11 Yahoo! Inc. Method For Generating Score-Optimal R-Trees
US11573989B2 (en) * 2017-02-24 2023-02-07 Microsoft Technology Licensing, Llc Corpus specific generative query completion assistant
CN108829804A (en) * 2018-06-05 2018-11-16 洛阳师范学院 Based on the high dimensional data similarity join querying method and device apart from partition tree

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810118A (en) * 2012-07-05 2012-12-05 上海电力学院 K nearest neighbor search method for variable weight network
CN103455531A (en) * 2013-02-01 2013-12-18 深圳信息职业技术学院 Parallel indexing method supporting real-time biased query of high dimensional data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"路网环境下的最近邻查询技术";鲍金玲等;《软件学报》;20171206;全文 *

Also Published As

Publication number Publication date
CN110347676A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
Li et al. LISA: A learned index structure for spatial data
Rocha-Junior et al. Top-k spatial keyword queries on road networks
US8688723B2 (en) Methods and apparatus using range queries for multi-dimensional data in a database
CN102270232B (en) Semantic data query system with optimized storage
Huang et al. Continuous distance-based skyline queries in road networks
CN111639075B (en) Non-relational database vector data management method based on flattened R tree
JP2005267612A (en) Improved query optimizer using implied predicates
JPH07191891A (en) Computer method and storage structure for storage of, and access to, multidimensional data
CN106294772A (en) The buffer memory management method of distributed memory columnar database
CN110347676B (en) Uncertainty tense data management and query method based on relation R tree
CN109656798B (en) Vertex reordering-based big data processing capability test method for supercomputer
Du et al. Spatio-temporal data index model of moving objects on fixed networks using hbase
CN105138674B (en) A kind of data bank access method
Shao et al. Efficiently processing spatial and keyword queries in indoor venues
KR101255639B1 (en) Column-oriented database system and join process method using join index thereof
CN113704248B (en) Block chain query optimization method based on external index
WO2020215437A1 (en) Approximate search method for spatial keyword query in electronic map
CN108241709A (en) A kind of data integrating method, device and system
Wong et al. Online skyline analysis with dynamic preferences on nominal attributes
CN110263108B (en) Keyword Skyline fuzzy query method and system based on road network
JP2011170461A (en) Information accumulation retrieval method and information accumulation retrieval program
CN106484863A (en) Increase algorithm based on attribute structure concept lattice
Krommyda et al. Spatial Data Management in IoT systems: A study of available storage and indexing solutions
CN111159175B (en) Incomplete database Skyline query method based on index
CN112148830A (en) Semantic data storage and retrieval method and device based on maximum area grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant