CN111639075A - Non-relational database vector data management method based on flattened R tree - Google Patents
Non-relational database vector data management method based on flattened R tree Download PDFInfo
- Publication number
- CN111639075A CN111639075A CN202010387252.XA CN202010387252A CN111639075A CN 111639075 A CN111639075 A CN 111639075A CN 202010387252 A CN202010387252 A CN 202010387252A CN 111639075 A CN111639075 A CN 111639075A
- Authority
- CN
- China
- Prior art keywords
- index
- vector data
- node
- tree
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a vector data management method in a non-relational database, which is oriented to a distributed non-relational database and designs an index structure based on an R tree flattening strategy for vector data; establishing a base table structure comprising vector data and an index structure, and correlating the base table structure with the index structure; coding and warehousing vector data, and constructing a flattened R tree index at the same time; providing a space query processing algorithm aiming at vector data and based on a flattened R tree; vector data in the non-relational database is maintained, including updating and deleting. The invention provides vector data query processing capacity supported by the R tree for the non-relational database by establishing the R tree index based on the flattening strategy, and can support the organization and management of large-scale vector data, so that the mass storage and parallel calculation of the non-relational database, the high availability, high reliability and other technologies benefit vector data types.
Description
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a vector data management method in a non-relational database.
Background
Over 85% of real world data is geographically related, and as reported by the global institute of mackentin, the total amount of geospatial data has surpassed 6000PB in 2016, and is still increasing annually at the PB level. Compared with the simple structure of raster data, the vector data structure is complex, and the vector data structure bears the main tasks of spatial analysis and spatial data query. Heterogeneous unstructured vector data increases the difficulty of managing using traditional relational databases; in the face of massive and continuously increasing massive data sets, the relational database has the problem of difficult overcoming in expandability.
The non-relational database follows CAP theory and BASE principle, emphasizes mode freedom, reading and writing efficiency and horizontal flexibility while weakening affairs, can provide efficient random access, multi-format data storage and high-concurrency data reading and writing, and provides a new idea and method for the problem by strong expansion capability and computing capability. A non-relational database system usually adopts a Key-Value storage model to store data, and ensures efficient query of the data by automatically establishing indexes for keys. In addition, the query capability of the database can be enriched by establishing a secondary index.
The purpose of the spatial index is to improve the query efficiency, the traditional spatial index is not designed for a distributed environment, and when massive vector data are stored and managed, the problems of difficult data storage organization, difficulty in meeting the real-time query requirement and the like exist. However, native spatial indexes of non-relational databases have poor support for vector data, taking MongoDB as an example, 2d indexes and 2dsphere indexes are two kinds of spatial indexes supported by MongoDB native, the 2d indexes only support indexes of point elements, and the 2dsphere indexes have the problems of no support for plane coordinate data, poor adaptivity to data, and the like, and are difficult to support for query processing of vector data.
Therefore, the problem that no proper index exists when the non-relational database is used for managing massive vector data is solved, the data cannot be efficiently organized and managed by the database due to the native spatial index mode, and the advantage of high concurrency of the non-relational database is difficult to exert.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the vector data management method in the non-relational database is provided, and the organization and management capacity of massive vector data is provided for the distributed non-relational database.
The technical scheme adopted by the invention for solving the technical problems is as follows: a vector data management method in a non-relational database is characterized in that: the method comprises the following steps:
s1, in a non-relational database environment, designing an auxiliary index structure based on an R tree flattening strategy for vector data;
s2, establishing a base table structure including vector data and an index structure, wherein all data tables in the base table structure are associated through explicit association records and implicit naming rules;
s3, encoding and storing the vector data, organizing the geometric and attribute information in GeoJSON and JSON forms respectively, and constructing a flattened R tree index;
s4, when receiving the query request, determining index metadata ID according to the query condition, further obtaining R tree root node ID, thereby executing the retrieval of vector data in parallel based on the R tree index table, and finally returning the query result;
and S5, maintaining the vector data in the non-relational database, including updating and deleting.
According to the method, in S1, the specific design steps of the auxiliary index structure based on the R-tree flattening policy are as follows:
1.1, abstracting a vector object into a minimum Bounding rectangle MBR (minimum Bounding rectangle), recursively combining MBRs adjacent to spatial positions into a higher-level MBR, and finally forming a layered tree structure based on the minimum Bounding rectangle;
1.2, expanding the R tree index structure into a flattened index node set, namely expressing each index node into a JSON structure, and using the unique identifier of the node as a pointer of a parent index item to a child index node;
1.3, setting a fan-out coefficient M of the R tree, and except for a root node, setting the number of child nodes of other R tree nodes to be positioned between intervals [2, M ].
According to the method, in the R tree nodes, the record format of the R tree leaf node is < OID, MBR >, and the record format of the middle node is < OID, Pointer, MBR >; wherein the OID is the unique identifier of the node, the Pointer points to the OID of the child node, and the MBR is the minimum outsourcing rectangle.
According to the above method, in S2, the structure of the library table is designed as follows:
2.1, managing multi-source heterogeneous vector data in a data set form, wherein each vector data set organizes logically related vector data with the same type;
2.2, designing a vector data table, an R tree index table, a vector metadata table and an index metadata table, wherein the vector data table, the R tree index table, the vector metadata table and the index metadata table are respectively used for storing vector elements, index structures and metadata of the vector elements and the index structures of the vector data set;
and 2.3, establishing an association relation among four types of tables, namely a vector data table, an R tree index table, a vector metadata table and an index metadata table, wherein each vector data set corresponds to one vector data table and one R tree index table, and metadata description is respectively carried out in the vector metadata table and the index metadata table.
According to the above method, the S3 specifically includes:
3.1, coding all vector elements in the vector data set by taking the vector data set as a unit, writing the vector elements into a vector data table, and organizing the geometric and attribute information in GeoJSON and JSON forms respectively; GeoJSON is a geographic space information data exchange format based on a JavaScript object representation method;
3.2, inquiring the geometric metadata information of the space domain where the vector elements are located in the vector metadata table, and acquiring the corresponding index metadata ID;
3.3, acquiring an R tree index table and a root node ID thereof from the index metadata table, navigating to a target leaf node of the R tree index table according to the geometric relationship between the minimum vector element outsourcing rectangle and the index item outsourcing rectangle, inserting an index item related to the vector element, and updating the R tree index table;
and 3.4, after the writing of the vector data set is completed, updating the vector metadata table and the index metadata table.
According to the above method, in 3.3, the specific manner of ID node navigation and R-tree index table update includes the following steps:
3.3.1, according to the geometric relationship between the minimum outsourcing rectangle of the vector elements and the outsourcing rectangle of the index item, searching the optimal insertion node by using ID node navigation, judging whether the number of child nodes of the node exceeds the set fan-out coefficient, if so, executing a step 3.3.2, otherwise, executing a step 3.3.3;
3.3.2, performing node splitting operation, equally dividing the node into two new nodes through an R tree node splitting algorithm, and navigating again to find an optimal insertion node;
3.3.3, inserting the index item of the vector element into the node, and updating the node;
3.3.4, if the root node is split, updating the information of the root node in the index metadata table.
According to the method, the structure of the JSON is { ID, L, C, D }; wherein the ID is a unique identifier of the index node, namely OID; l (level) is the number of layers of the tree where the node is located; c (count) is the number of child nodes owned by the node; d (Despendants) is a JSON nested structure, and records the unique identifier of the child node owned by the node and the minimum bounding box; d (Despendants) has the detailed structure of D { { P, M }, …, { P, M } }, wherein P (pointer) points to OID of its child node, and M (MBR) is the minimum enclosing rectangle of the child node and is organized in GeoJSON form.
According to the above method, the S4 specifically includes:
4.1, giving query conditions such as data set names, query ranges and the like by a user;
4.2, inquiring the geometric metadata information of the space domain in the vector metadata table according to a given inquiry condition to obtain the ID information of the corresponding index metadata;
4.3, acquiring the ID of the R tree index table and the root node thereof from the index metadata table;
4.4, inquiring the R tree index table, and taking out the index items of the vector elements meeting the inquiry conditions;
and 4.5, taking out the vector data from the vector data table, and carrying out fine filtering to finally obtain a query result.
According to the above method, in 4.4, the specific way of querying the R tree index table includes the following steps:
4.4.1, acquiring a corresponding key value pair in the R tree index table through the ID information of the root node, taking out the key value pair and deserializing the key value pair;
4.4.2, judging the relation between the MBR of each child node and the query range according to the geometric information in the GeoJSON, finding out child nodes which are intersected with the query range or in the query range of the MBR, and taking out and deserializing corresponding child node key value pairs according to pointers of the parent index items pointing to the child index nodes;
4.4.3, repeating the step 4.4.2 until the leaf node of the R tree is inquired.
According to the above method, in S5, the deleting the vector data includes:
5.1, inquiring the geometric metadata information of the space domain where the data to be deleted is located in the vector metadata table, and acquiring corresponding index metadata information;
5.2, acquiring an R tree index table and a root node thereof from the index metadata table, and positioning an index item associated with the vector element to be deleted according to the geometric relationship between the minimum outsourcing rectangle of the R tree node and the query box;
5.3, deleting the corresponding vector data in the vector data table;
and 5.4, deleting the index item associated with the vector data in the R tree index table.
According to the scheme, the index information of the corresponding data is updated only after the data insertion operation is completed, so that the integrity of the content item is ensured. In the non-relational database, the error is a normal state, if the index item is inserted first, the system is down after the index item is inserted, and the system can think that the data is stored in the database after the system is restarted, so that the data is lost.
The invention has the beneficial effects that: the method stores the geometric information of the vector data in a GeoJSON file, and encodes and stores the data in a storage mode in a key value pair mode; by designing an index structure based on a flattened R tree, adjacent entities in space are guaranteed to be stored in the same or adjacent storage nodes; vector data query processing supported by an R tree is provided for a distributed non-relational database, multi-node parallel execution R tree query operation can be performed by utilizing the distributed storage characteristics of the non-relational database, the distributed and high-concurrency characteristics of the non-relational database are fully utilized, and the query efficiency is greatly improved; the updating and deleting of the vector data can not cause the error of the index, and the requirement of the real-time property of data access is met.
Drawings
FIG. 1 is a flowchart of a method according to an embodiment of the present invention.
Fig. 2 shows a spatial distribution of vector elements MBR according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an R-tree structure corresponding to fig. 2.
FIG. 4 is a diagram illustrating an exemplary storage structure of vector data according to an embodiment of the present invention.
FIG. 5 shows the structure and association of a library table including vector data and index structures according to an embodiment of the present invention.
FIG. 6 is a flow chart of vector data insertion and R-tree index construction.
FIG. 7 is a flow chart of vector data query.
Fig. 8 is a flowchart of vector data deletion.
Detailed Description
The invention is further illustrated by the following specific examples and figures.
As shown in fig. 1, the method for managing vector data of a non-relational database based on a flattened R tree provided by the present invention specifically includes:
101. in a non-relational database environment, an auxiliary index structure based on an R-tree flattening strategy is designed for vector data.
Specifically, the embodiment of the invention designs a flat R tree index storage scheme oriented to an HBase database, wherein the scheme takes the unique identifier of a node of an R tree as a row key, and node information is stored in the database as each column in a column family E in a JSON nested format. Wherein the spatial information of the vector data is organized in a GeoJSON format. Each row in the data table represents a node, all nodes with the same index are stored in the same R-tree index table, and the detailed structure is shown in a table 1-1.
TABLE 1-1R Tree index Table Structure
The fields and type descriptions of the R-tree index table are shown in tables 1-2.
Table 1-2R tree index table structural description
For example, the vector data in the preset region in the embodiment of the present invention are located in the same spatial domain, and the distribution is shown in fig. 2. Setting the fan-out coefficient M of the R tree to 3, each node boundary range is represented by a minimum outsourcing rectangle MBR, and index entries about vector elements are stored by leaf nodes, and the corresponding R tree structure is as shown in fig. 3. Based on the R-Tree index table structure in Table 1-2, the R-Tree in FIG. 3 is expanded into a flattened document set, as shown in Table 1-3, so that the query operation of the tree can be done by node ID navigation.
Table 1-3 flattening storage of R-tree nodes
The JSON detailed structure is { ID, L, C, D }. Wherein id (oid) is the unique identifier of the index node, l (level) is the number of layers of the tree where the node is located, c (count) is the number of child nodes owned by the node, and d (despendants) is a JSON nested structure, and the unique identifier and the minimum bounding box of the child nodes owned by the node are recorded.
D (Despendants) has the detailed structure of D { { P, M }, …, { P, M } }, wherein P (pointer) points to OID of its child node, and M (MBR) is the minimum enclosing rectangle of the child node and is organized in GeoJSON form.
102. And establishing a base table structure comprising vector data and an index structure, wherein all data tables in the base table structure are associated through explicit association records and implicit naming rules.
For example, the storage, organization and management of vector data in the non-relational database HBase in the embodiment of the present invention relate to various types of base tables, and in addition to the above-designed R tree index table, the structures of other base tables are designed and described as follows:
the vector metadata table is used for storing vector metadata, explaining detailed information of each vector data set in the database and helping the indexing system to filter some meaningless requests. The vector METADATA table is named "VO _ METADATA", and its row key is the name of the data set (DatasetName); the metadata table contains two column families, a required column family (E) and an optional column family (F). Other user-defined fields are placed under column family F. The structure of the metadata table is shown in table 2-1.
Table 2-1 vector metadata table structure
The fields and type descriptions of the vector metadata table are shown in table 2-2.
Table 2-2 vector metadata table structure description
The index metadata stored in the index metadata table is description information of the R-tree space index. The information is quoted by vector metadata information, the vector data table and the R tree index table are correlated, and the internal structure of the R tree node and the initial node position of the algorithm are further determined by recording R tree parameters. The index METADATA table is named "IDX _ METADATA", the row key of which is the name of the index table (IndexTableName), and the detailed structure of which is shown in table 3-1.
Table 3-1 index metadata table structure
The fields and type descriptions of the index metadata table are shown in table 3-2.
Table 3-2 index metadata table structural description
The vector data table stores the original vector data information. Fig. 4 shows a storage structure of a certain vector element in a vector data table in the embodiment of the present invention, and geometric information of vector data is organized in a GeoJSON format. Specifically, the geometric information is stored in a "GEOINFO" field, where the "type" field identifies the geometric type of the element and the "coordinate" field stores the vertex coordinate array of the geometric object. Non-geometric information of an element is also stored by means of different fields, such as the "NAME" field, which indicates the NAME of the vector element. The detailed structure of the vector data table is shown in table 4-1.
TABLE 4-1 vector data Table Structure
The fields and type descriptions of the vector data table are shown in table 4-2.
Table 4-2 vector data table structure description
Specifically, a vector database mode supported by an R tree index is designed for the characteristics of the HBase database, as shown in fig. 5. The association rules of the vector data table, the R tree index table, the vector metadata table and the index metadata table are as follows:
the vector metadata table records the space domain name and the corresponding namespace of the vector data table, and when more than one space domain exists in a vector data set, a plurality of records are stored in the vector metadata set.
The R tree index table corresponding to each spatial domain is bound with the corresponding vector data table through a specific naming specification, the naming mode of the index table is 'Rtree _ spatial domain name _ namespace', and the naming mode of the data table is 'spatial domain name _ namespace'.
The index metadata information is associated with the vector metadata table by recording the ID, and meanwhile, the root node in the R-tree index table is also associated with the index metadata table by recording the ID.
103. Vector data are coded, stored in a warehouse, geometric and attribute information are organized in GeoJSON and JSON modes respectively, and a flattened R tree index is constructed at the same time.
For example, a flowchart of vector data insertion and R-tree index construction according to an embodiment of the present invention is shown in fig. 6. Firstly, all vector elements in a vector data set are coded and written into a vector data table by taking the vector data set as a unit. And then inquiring whether the geometric metadata information of the space domain where the vector elements are located in the vector metadata table exists or not, if not, storing the geometric metadata information in the vector metadata table, and updating the index metadata information at the same time.
Illustratively, index metadata ID is obtained, an R-tree index table and a root node ID thereof are obtained from the index metadata table, and navigation is carried out to find the optimal insertion node. It is necessary to determine whether the number of child nodes of the optimal insertion node exceeds a preset fan-out coefficient.
Specifically, starting from the root node, it is first determined whether the current node MBR includes the MBR of the vector element to be inserted, and if not, it is continuously determined whether the next node includes the MBR of the vector element to be inserted, until the MBR of the vector element to be inserted is included, it is determined whether the subnode MBR of the node includes the MBR of the vector element to be inserted. The best insertion node should satisfy MBRs where the node's own MBR contains the vector element to be inserted and its subnode MBR does not. After navigating to a destination leaf node, judging whether the number of child nodes of the current node exceeds a preset fan-out coefficient, and if not, inserting an index item related to a vector element; otherwise, the R tree splitting strategy is used for splitting the nodes, the id node navigation is carried out again, and the index items of the vector elements are inserted into the split optimal nodes. If the root node is split, the information of the root node is updated in the index metadata table. And finally, updating the R tree index item and the metadata table information to complete the data insertion operation.
After the data insertion operation is completed, the index information of the corresponding data is updated, so as to ensure the integrity of the content entry. In the non-relational database, the error is a normal state, if the index item is inserted first, the system is down after the index item is inserted, and the system can think that the data is stored in the database after the system is restarted, so that the data is lost.
104. Providing query support for vector data: when a query request is received, determining index metadata ID according to query conditions, further acquiring R-tree root node ID, and then executing vector data retrieval in parallel based on an R-tree index table, and finally returning a query result.
Illustratively, a flowchart of a vector data query proposed by an embodiment of the present invention is shown in fig. 7, and includes the following steps:
step 1: a user gives query conditions such as a data set name and a query range, the corresponding query range in the embodiment of the invention is shown as a query box in FIG. 2, and a query polygon area is obtained;
step 2: querying geometric metadata information of the space domain in the vector metadata table, if the information does not exist, finishing querying, and if the information does not exist, not existing vector elements meeting querying conditions in the data set; if the information exists, acquiring the ID of the corresponding index metadata;
and step 3: inquiring an index metadata table, and acquiring an R tree index table corresponding to the inquired polygon area and the ID of a root node of the R tree index table;
and 4, step 4: and querying the R tree index table, taking out the key value pairs of the root nodes and deserializing. Judging from the geometric information recorded in GeoJSON to obtain: the query box in FIG. 2 intersects the MBR of child node N1, and is contained by the MBR of child node N2. According to the pointer of the parent index item to the child index node, key value pairs corresponding to the N1 and N2 nodes are taken out from the index table in a multithread parallel mode, and the geometric relationship is further judged after deserialization so as to be known: the query box intersects the MBR of child nodes N4, N6, N7. Recursively, key value pairs corresponding to N4, N6 and N7 are taken out in parallel and deserialized, and the geometrical relationship is judged in parallel by using a multithread mode to obtain: the MBRs of child nodes L10, L16, and L19 fall within the query box, and the MBRs of child nodes L12 and L18 intersect the query box. Judging that the leaf node set meeting the conditions is obtained when the leaf node is inquired currently: { L10, L12, L16, L18, L19 };
and 5: and taking out the vector data from the vector data table according to the index item, and filtering geometric information of the obtained data through fine query to obtain a vector data set meeting query conditions. In addition, the filtering process may be attribute information filtering, such as whether the building area of a building is greater than 2500m2Whether the name string contains a certain shop name, etc.
105. Vector data is maintained, including updates and deletions.
Specifically, the data can be updated in real time through a pre-established R tree index based on a flattening policy, and the implementation manner of the updating process is similar to the establishment process of the R tree index. When the vector data stored in the database is modified, the index data of the nodes influenced by the modified data are updated at the same time, and the effectiveness of the data is ensured.
For example, a vector data deleting flow chart provided by the embodiment of the present invention is shown in fig. 8, where the flow of deleting the vector data corresponding to the leaf node L17 index entry in the R tree includes the following steps:
step 1: inquiring the geometric metadata information of the spatial domain where the data to be deleted is located in the vector metadata table, and acquiring corresponding index metadata information;
step 2: inquiring an index metadata table, acquiring an R tree index table and a root node thereof, and positioning an index item associated with a vector element to be deleted according to the geometric relationship between the minimum outsourcing rectangle of the R tree node and an inquiry frame;
and step 3: deleting the corresponding vector data in the vector data table;
and 4, step 4: deleting an index entry in the R tree index table associated with the vector data;
and 5: the R-tree index table and the index metadata table are updated.
The invention provides a vector data management method of a non-relational database based on a flattened R tree, which is oriented to a novel non-relational database, provides vector data query processing supported by the R tree for a distributed non-relational database by establishing an R tree index based on a flattened strategy, and can utilize the distributed storage characteristic of the non-relational database to perform R tree query operation executed in parallel by multiple nodes, thereby supporting large-scale vector data organization and management oriented to the non-relational database, leading the mass storage and parallel calculation of the non-relational database, leading the technologies such as high availability, high reliability and the like to benefit vector data types.
The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.
Claims (10)
1. A vector data management method in a non-relational database is characterized in that: the method comprises the following steps:
s1, in a non-relational database environment, designing an auxiliary index structure based on an R tree flattening strategy for vector data;
s2, establishing a base table structure including vector data and an index structure, wherein all data tables in the base table structure are associated through explicit association records and implicit naming rules;
s3, encoding and storing the vector data, organizing the geometric and attribute information in GeoJSON and JSON forms respectively, and constructing a flattened R tree index;
s4, when receiving the query request, determining index metadata ID according to the query condition, further obtaining R tree root node ID, thereby executing the retrieval of vector data in parallel based on the R tree index table, and finally returning the query result;
and S5, maintaining the vector data in the non-relational database, including updating and deleting.
2. The vector data management method according to claim 1, characterized in that: in S1, the specific design steps of the auxiliary index structure based on the R-tree flattening policy are as follows:
1.1, abstracting a vector object into a minimum outsourcing rectangle MBR, recursively combining MBRs adjacent to spatial positions into a higher-level MBR, and finally forming a layered tree structure based on the minimum outsourcing rectangle;
1.2, expanding the R tree index structure into a flattened index node set, namely expressing each index node into a JSON structure, and using the unique identifier of the node as a pointer of a parent index item to a child index node;
1.3, setting a fan-out coefficient M of the R tree, and except for a root node, setting the number of child nodes of other R tree nodes to be positioned between intervals [2, M ].
3. The vector data management method according to claim 2, characterized in that: in the R tree nodes, the record format of the R leaf node is < OID, MBR >, and the record format of the middle node is < OID, Pointer, MBR >; wherein the OID is the unique identifier of the node, the Pointer points to the OID of the child node, and the MBR is the minimum outsourcing rectangle.
4. The vector data management method according to claim 1, characterized in that: in S2, the library table structure is designed as follows:
2.1, managing multi-source heterogeneous vector data in a data set form, wherein each vector data set organizes logically related vector data with the same type;
2.2, designing a vector data table, an R tree index table, a vector metadata table and an index metadata table, wherein the vector data table, the R tree index table, the vector metadata table and the index metadata table are respectively used for storing vector elements, index structures and metadata of the vector elements and the index structures of the vector data set;
and 2.3, establishing an association relation among four types of tables, namely a vector data table, an R tree index table, a vector metadata table and an index metadata table, wherein each vector data set corresponds to one vector data table and one R tree index table, and metadata description is respectively carried out in the vector metadata table and the index metadata table.
5. The vector data management method according to claim 1, characterized in that: the S3 specifically includes:
3.1, coding all vector elements in the vector data set by taking the vector data set as a unit, writing the vector elements into a vector data table, and organizing the geometric and attribute information in GeoJSON and JSON forms respectively; GeoJSON is a geographic space information data exchange format based on a JavaScript object representation method;
3.2, inquiring the geometric metadata information of the space domain where the vector elements are located in the vector metadata table, and acquiring the corresponding index metadata ID;
3.3, acquiring an R tree index table and a root node ID thereof from the index metadata table, navigating to a target leaf node of the R tree index table according to the geometric relationship between the minimum vector element outsourcing rectangle and the index item outsourcing rectangle, inserting an index item related to the vector element, and updating the R tree index table;
and 3.4, after the writing of the vector data set is completed, updating the vector metadata table and the index metadata table.
6. The vector data management method according to claim 5, wherein: in the step 3.3, the specific modes of ID node navigation and R-tree index table update include the following steps:
3.3.1, according to the geometric relationship between the minimum outsourcing rectangle of the vector elements and the outsourcing rectangle of the index item, searching the optimal insertion node by using ID node navigation, judging whether the number of child nodes of the node exceeds the set fan-out coefficient, if so, executing a step 3.3.2, otherwise, executing a step 3.3.3;
3.3.2, performing node splitting operation, equally dividing the node into two new nodes through an R tree node splitting algorithm, and navigating again to find an optimal insertion node;
3.3.3, inserting the index item of the vector element into the node, and updating the node;
3.3.4, if the root node is split, updating the information of the root node in the index metadata table.
7. The vector data management method according to claim 5, wherein: the JSON has a structure of { ID, L, C, D }; wherein the ID is a unique identifier of the index node, namely OID; l is the number of layers of the node in the tree; c is the number of the child nodes owned by the node; d is a JSON nested structure, and records the unique identifier and the minimum bounding box of the child node owned by the node;
the detailed structure of D is D { { P, M }, …, { P, M } }, wherein P is the abbreviation of Pointer, pointing to the OID of its child node; m is an abbreviation of MBR, the minimum bounding rectangle of the child node, organized in the form of GeoJSON.
8. The vector data management method according to claim 1, characterized in that: the S4 specifically includes:
4.1, giving query conditions such as data set names, query ranges and the like by a user;
4.2, inquiring the geometric metadata information of the space domain in the vector metadata table according to a given inquiry condition to obtain the ID information of the corresponding index metadata;
4.3, acquiring the ID of the R tree index table and the root node thereof from the index metadata table;
4.4, inquiring the R tree index table, and taking out the index items of the vector elements meeting the inquiry conditions;
and 4.5, taking out the vector data from the vector data table, and carrying out fine filtering to finally obtain a query result.
9. The vector data management method according to claim 8, wherein: in 4.4, the specific way of querying the R tree index table includes the following steps:
4.4.1, acquiring a corresponding key value pair in the R tree index table through the ID information of the root node, taking out the key value pair and deserializing the key value pair;
4.4.2, judging the relation between the MBR of each child node and the query range according to the geometric information in the GeoJSON, finding out child nodes which are intersected with the query range or in the query range of the MBR, and taking out and deserializing corresponding child node key value pairs according to pointers of the parent index items pointing to the child index nodes;
4.4.3, repeating the step 4.4.2 until the leaf node of the R tree is inquired.
10. The vector data management method according to claim 1, characterized in that: in S5, the process of deleting the vector data includes:
5.1, inquiring the geometric metadata information of the space domain where the data to be deleted is located in the vector metadata table, and acquiring corresponding index metadata information;
5.2, acquiring an R tree index table and a root node thereof from the index metadata table, and positioning an index item associated with the vector element to be deleted according to the geometric relationship between the minimum outsourcing rectangle of the R tree node and the query box;
5.3, deleting the corresponding vector data in the vector data table;
and 5.4, deleting the index item associated with the vector data in the R tree index table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010387252.XA CN111639075B (en) | 2020-05-09 | 2020-05-09 | Non-relational database vector data management method based on flattened R tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010387252.XA CN111639075B (en) | 2020-05-09 | 2020-05-09 | Non-relational database vector data management method based on flattened R tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111639075A true CN111639075A (en) | 2020-09-08 |
CN111639075B CN111639075B (en) | 2023-05-12 |
Family
ID=72333187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010387252.XA Active CN111639075B (en) | 2020-05-09 | 2020-05-09 | Non-relational database vector data management method based on flattened R tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111639075B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989079A (en) * | 2021-04-22 | 2021-06-18 | 北京电信易通信息技术股份有限公司 | Novel image data retrieval method and system |
CN113384898A (en) * | 2021-06-10 | 2021-09-14 | 网易(杭州)网络有限公司 | Data processing method, device, equipment and storage medium |
CN113536041A (en) * | 2021-06-08 | 2021-10-22 | 中国铁路设计集团有限公司 | Method for rapidly acquiring railway engineering geographic information metadata in batches |
CN113946584A (en) * | 2021-10-26 | 2022-01-18 | 中国矿业大学 | QRB tree indexing method for massive vector data retrieval |
CN116756139A (en) * | 2023-05-12 | 2023-09-15 | 中国自然资源航空物探遥感中心 | Data indexing method, system, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138560A (en) * | 2015-07-23 | 2015-12-09 | 北京天耀宏图科技有限公司 | Multilevel spatial index technology based distributed space vector data management method |
CN105488043A (en) * | 2014-09-15 | 2016-04-13 | 南京理工大学 | Data query method and system based on Key-Value data blocks |
CN107423368A (en) * | 2017-06-29 | 2017-12-01 | 中国测绘科学研究院 | A kind of space-time data indexing means in non-relational database |
US20190102389A1 (en) * | 2017-10-04 | 2019-04-04 | Dell Products Lp | Storing and processing json documents in a sql database table |
-
2020
- 2020-05-09 CN CN202010387252.XA patent/CN111639075B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488043A (en) * | 2014-09-15 | 2016-04-13 | 南京理工大学 | Data query method and system based on Key-Value data blocks |
CN105138560A (en) * | 2015-07-23 | 2015-12-09 | 北京天耀宏图科技有限公司 | Multilevel spatial index technology based distributed space vector data management method |
CN107423368A (en) * | 2017-06-29 | 2017-12-01 | 中国测绘科学研究院 | A kind of space-time data indexing means in non-relational database |
US20190102389A1 (en) * | 2017-10-04 | 2019-04-04 | Dell Products Lp | Storing and processing json documents in a sql database table |
Non-Patent Citations (1)
Title |
---|
杨成月,等: "基于非关系数据库的全球时空大数据组织管理研究" * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989079A (en) * | 2021-04-22 | 2021-06-18 | 北京电信易通信息技术股份有限公司 | Novel image data retrieval method and system |
CN113536041A (en) * | 2021-06-08 | 2021-10-22 | 中国铁路设计集团有限公司 | Method for rapidly acquiring railway engineering geographic information metadata in batches |
CN113384898A (en) * | 2021-06-10 | 2021-09-14 | 网易(杭州)网络有限公司 | Data processing method, device, equipment and storage medium |
CN113384898B (en) * | 2021-06-10 | 2024-01-30 | 网易(杭州)网络有限公司 | Data processing method, device, equipment and storage medium |
CN113946584A (en) * | 2021-10-26 | 2022-01-18 | 中国矿业大学 | QRB tree indexing method for massive vector data retrieval |
CN116756139A (en) * | 2023-05-12 | 2023-09-15 | 中国自然资源航空物探遥感中心 | Data indexing method, system, storage medium and electronic equipment |
CN116756139B (en) * | 2023-05-12 | 2024-04-23 | 中国自然资源航空物探遥感中心 | Data indexing method, system, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111639075B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639075B (en) | Non-relational database vector data management method based on flattened R tree | |
CN107423368B (en) | Spatio-temporal data indexing method in non-relational database | |
Rocha-Junior et al. | Top-k spatial keyword queries on road networks | |
US8768977B2 (en) | Data management using writeable snapshots in multi-versioned distributed B-trees | |
US9047333B2 (en) | Dynamic updates to a semantic database using fine-grain locking | |
Deng et al. | Best keyword cover search | |
US20100235344A1 (en) | Mechanism for utilizing partitioning pruning techniques for xml indexes | |
CN106933833A (en) | A kind of positional information method for quickly querying based on Spatial Data Index Technology | |
CN109582678B (en) | R tree index optimization method of multi-granularity distributed read-write lock based on leaf nodes | |
Challa et al. | DD-Rtree: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms | |
CN106874425A (en) | Real time critical word approximate search algorithm based on Storm | |
Tian et al. | A survey of spatio-temporal big data indexing methods in distributed environment | |
de Souza Baptista et al. | NoSQL geographic databases: an overview | |
CN114372058A (en) | Spatial data management method and device, storage medium and block chain system | |
Álvarez-García et al. | Compact and efficient representation of general graph databases | |
CN110347676B (en) | Uncertainty tense data management and query method based on relation R tree | |
CN116881243A (en) | Learning type indexing method and system based on time sequence data characteristics | |
Min et al. | The mobile spatial DBMS for the partial map air update in the navigation | |
Tian et al. | Tinba: Incremental partitioning for efficient trajectory analytics | |
Di Pasquale et al. | Chapter 6: Access Methods and Query Processing Techniques | |
Evangelidis et al. | Using the Holy Brick Tree for Spatial Data in General Purpose DBMSs | |
Van Oosterom et al. | Testing current DBMS products with real spatial data | |
Ilkhomjon et al. | About Database (Db) | |
Yan et al. | Big Data Storage Index Mechanism Based on Hierarchical Indexing and Concurrent Updating | |
SEMI-STRUCTURED et al. | Mohamad Hasan Evgeny Panidi Vladimir Badenko |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |