CN114357062B

CN114357062B - Vector data overlapping or gap checking method based on space position segmentation

Info

Publication number: CN114357062B
Application number: CN202110437020.5A
Authority: CN
Inventors: 汪云龙; 燕瑜
Original assignee: Star Dipper Chengdu Information Technology Co ltd
Current assignee: Star Dipper Chengdu Information Technology Co ltd
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2022-08-02
Anticipated expiration: 2041-04-22
Also published as: CN114357062A

Abstract

The invention belongs to the technical field of geographic information, and discloses a vector data overlapping or gap checking method based on space position segmentation. Therefore, during the integral inspection, the computing resources of each computing node are fully and simultaneously utilized, the data broadcasting is avoided, the barrel effect under the distributed computing is avoided, the computing bottleneck is broken through, and the inspection efficiency is greatly improved.

Description

Vector data overlapping or gap checking method based on space position segmentation

Technical Field

The invention belongs to the technical field of geographic information, and particularly relates to a vector data overlapping or gap checking method based on spatial position segmentation.

Background

The space vector data of the geographic information industry often needs topology inspection, and some topology inspection comprises overlapping and gap inspection among the image spots (space vector surface data) (namely, the image spots cannot be overlapped or have a gap phenomenon). For example, in "three-tone of land and earth" and "general survey of geographical national conditions", the map patches of land type, administrative regions, etc. for the current state of land use survey are not overlapped or gapped between these map patches, and the overlap and gap check of these space vector data is required in production and management. However, the data amount is often huge, for example, in the province of Sichuan, tens of thousands to hundreds of thousands of map patches in a county are data, and the ease of a city reaches the level of millions, so that the data amount is huge in a province or even a country, and the spatial data is more complex than the conventional data, so that the data itself is huge. At present, tap products (such as arcgis) in the industry are used for carrying out overlapping gap inspection on the data, the data can be accepted under the condition of small data quantity, the inspection is time-consuming under the condition of huge data quantity, and even a common machine cannot calculate the result.

In future, projects such as emergency fields (such as real-time inspection of disaster relief coverage) and traffic scheduling based on geographic information inevitably put higher requirements on timeliness of the inspection, and the inspection rate must be increased. At present, the existing gis software is low in inspection efficiency, and especially under the condition that the original data size is large (for example, the arcgis performs overlapping inspection on data of 72 ten thousand image spots, and only the inspection time needs more than 20 minutes), the requirement of high efficiency is difficult to achieve. In addition, in the distributed computing storage environment, only this environment is used, and since each patch needs to be judged or calculated with other patches in the overlay or gap check, a data broadcast phenomenon occurs in the distributed computing storage environment, and the result is slower than the calculation using the centralized single-machine environment.

In view of the above-mentioned drawbacks of the prior art, a method for inspecting vector data overlap or gap based on spatial position segmentation is provided.

Disclosure of Invention

The invention aims to provide a vector data overlapping or gap checking method based on space position segmentation, which is used for solving one of the technical problems in the prior art, such as: at present, the existing gis software is low in inspection efficiency, and especially under the condition that the original data size is large (for example, the arcgis performs overlapping inspection on data of 72 ten thousand image spots, and only the inspection time needs more than 20 minutes), the requirement of high efficiency is difficult to achieve. In addition, in the distributed computing storage environment, only this environment is used, and since each patch needs to be judged or calculated with other patches in the overlay or gap check, a data broadcast phenomenon occurs in the distributed computing storage environment, and the result is slower than the calculation using the centralized single-machine environment.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a vector data overlapping or gap checking method based on space position segmentation comprises the following steps:

step S1: importing space vector data into a distributed computing storage environment, and distributing the space vector data on each node by a distributed database in a Hash manner according to conventional data; extracting target data characteristics corresponding to corresponding space vector data according to the historical space vector data samples, and taking the target data characteristics as reference standards; extracting actual data characteristics corresponding to actual space vector data from the actually imported space vector data; the method comprises the steps of carrying out matching judgment on actual data features and target data features through a main judgment unit, importing actual space vector data if the actual data features are matched with the target data features, carrying out matching judgment on the actual data features and the target data features through a secondary judgment unit if the actual data features are not matched with the target data features, importing the actual space vector data if the actual data features are matched with the target data features, and abandoning the actual space vector data if the actual data features are not matched with the target data features. The main judging unit and the auxiliary judging unit are judging units with the same function and are independent from each other.

Step S2: generating spatial grids for cutting the spatial vector data, wherein each spatial grid has a unique spatial grid id;

step S3: cutting space vector data by using a space grid, and recording original data id and space grid id of each cut vector data after cutting; the cut space vector data are redistributed to each node of the distributed database according to the space grid id, namely, the space vector data in the same space grid are stored in the same node;

step S4: and checking whether the space vector data in each space grid has overlap or gaps, wherein if the original data id of the two vector data are the same, the two vector data are not checked, because the two vector data are formed by cutting and dividing one original data by the space grid.

Further, step S2 includes the following sub-steps:

step S21: the number p of the space grids is approximately half of the total number m of the original elements, and p is m/2;

step S22: and generating p space grids according to the range of the space vector data and the number p of the space grids, covering all the space vector data, and numbering each space grid, namely each grid has a unique grid id.

Further, step S2 is specifically as follows:

when the space vector data are uniformly distributed on the space position, generating a uniform rectangular grid according to the range of the space vector data;

when the spatial vector data are unevenly distributed in the spatial position, the spatial grid is generated by using the data structure of the R + tree, and the R + tree is a balanced tree, so that the vector data in each rectangular grid can be ensured to be even in quantity.

Further, step S4 includes:

the space vector data in one space grid only needs to be judged and compared with other space vector data in the same space grid once, and the space vector data in the same space grid is distributed on the same database sub-node.

Compared with the prior art, the invention has the beneficial effects that:

one innovation point of the scheme is that the method provided by the invention can greatly improve the efficiency of checking the overlapping or gaps of the massive vector data. The method has the advantages that the original data are cut by utilizing the spatial grid (the new data are generated by physical cutting, the logical relationship is reserved, and the original data are not damaged), the new data are redistributed to each storage node according to the spatial position relationship, time-consuming data broadcasting is avoided during calculation, and the number of other elements required to be judged and calculated by each element is greatly reduced (each element only needs to be judged or calculated with other elements in the same grid, but not with all other elements once). Therefore, during the integral inspection, the computing resources of each computing node are fully and simultaneously utilized, the data broadcasting is avoided, the barrel effect under the distributed computing is avoided, the computing bottleneck is broken through, and the inspection efficiency is greatly improved.

Drawings

FIG. 1 is a flow chart illustrating exemplary steps of an embodiment of the present invention.

FIG. 2 is a schematic diagram of grid-cut space vector data according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to fig. 1-2 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example (b):

as shown in fig. 1, a method for checking vector data overlap or gap based on spatial position segmentation is proposed, which includes the following steps:

step S1: importing space vector data into a distributed computing storage environment, and distributing the space vector data on each node by a distributed database in a Hash manner according to conventional data; (Hash distribution: A unique distribution Key, typically a Primary Key, is assigned, the distribution Key assigns each row of data to a specific database child node using a hashing algorithm, and keys of the same value will always hash to the same child node, where the data is distributed more evenly across the child nodes.

The import of the space vector data is very critical, and if the space vector data imported at the beginning is error data or incomplete data, even if all subsequent data processing processes are correctly executed, the final result is an error result generated by a malfunction. It is crucial to check the import of space vector data.

Extracting target data characteristics corresponding to corresponding space vector data according to the historical space vector data samples, and taking the target data characteristics as reference standards; extracting actual data characteristics corresponding to actual space vector data from the actually imported space vector data; the method comprises the steps of carrying out matching judgment on actual data features and target data features through a main judgment unit, importing actual space vector data if the actual data features are matched with the target data features, carrying out matching judgment on the actual data features and the target data features through a secondary judgment unit if the actual data features are not matched with the target data features, importing the actual space vector data if the actual data features are matched with the target data features, and abandoning the actual space vector data if the actual data features are not matched with the target data features. The main judging unit and the auxiliary judging unit are judging units with the same function and are independent from each other.

Through the scheme, the space vector data is identified and judged, and is imported only when the actual space vector data is the current target space vector data, otherwise, the actual space vector data is discarded; avoiding subsequent false operations. Meanwhile, when the identification and judgment are carried out, the main judgment unit and the auxiliary judgment unit which are independent from each other are adopted for carrying out double judgment, so that the phenomenon that the misjudgment is carried out due to accidental errors of the main judgment unit, and further important target space vector data are lost is avoided.

Step S2: generating a plurality of spatial grids for cutting the spatial vector data, wherein each spatial grid has a unique spatial grid id;

step S3: cutting space vector data by using a space grid, and recording an original element id to which the space vector data belongs and a space grid id to which the space vector data belongs after cutting; according to the spatial grid id to which the spatial vector data belong, redistributing the spatial vector data to each node of the distributed database, namely storing the spatial vector data in the same spatial grid on the same node; as shown in fig. 2, the spot 7 is divided into 4 blocks by a mesh, and the mesh ids assigned to the 4 blocks of new elements are different, but the original element ids assigned to the 4 blocks are the same and are all 7. Wherein, evaluating the uniformity of spatial distribution of the original space vector data: according to the original data space range, a quadtree space index with the depth of k is generated (k generally takes the best value between empirical values 4 and 7), the number of leaf nodes of the generated quadtree is T, the number Ni of vector data under each leaf node is calculated (Ni is the number of vector data under the ith leaf node), then the average number P of vector data under each leaf node is obtained, and finally the evaluation value E of the uniformity degree of the spatial distribution of the whole vector data is obtained (E is Σ Xi/T) after the data distribution deviation value Xi (Xi is Ni/P) of each leaf node is calculated. According to the evaluation value E of the uniformity of the spatial distribution, a reference basis can be provided for generating a spatial grid by adopting a later method.

Step S4: and checking whether the space vector data in each space grid has overlap or gaps, wherein if the original element id of two vector data is the same, no check is carried out, because the two vector data are formed by cutting and dividing one original element by the space grid. It is only necessary to check whether there are heavy or gap between the elements in each spatial grid to ensure that there is overlap or gap between all the patches.

And (3) analysis: the main steps that affect the time complexity of the algorithm,

from the above, the number of grids cannot be too large or too small. However, when the method is applied to spatial data, a rectangular grid is a simpler surface (5 points constitute, and the starting point and the end point have the same coordinate), and the original data is actual spatial vector data, such as a map spot of a land type (e.g., a lake surface), which is often described by a plurality of points, and is often described by dozens of points, or hundreds or thousands of points, so that the judgment and calculation times of the original elements and the original elements are the main calculation bottleneck, and therefore, the number of the elements in each grid needs to be reduced as much as possible, that is, the number of the elements in each grid needs to be reduced as much as possible. Theoretically, only two optimal grids exist in each grid, the number p of the space grids is approximately half of the total number m of the original elements, and p is m/2; and generating p spatial grids according to the range of the spatial vector data and the number p of the spatial grids, covering the range of the spatial vector data, and numbering each spatial grid, namely each grid has a unique grid id.

The generation of the spatial grid has other methods, the uniform rectangular grid is produced according to the original data spatial range, the method is simplest, when the original data are distributed on the spatial position very unevenly, the element difference included by a plurality of empty grids or partial grids is too large, and in this way, the grid can be generated by using the data structure of the R + tree, the empty grids can be ensured to be absent, the element quantity in each grid can be balanced, but the time is consumed compared with the simple generation of the uniform rectangular grid, but the subsequent calculation can more fully utilize the node resources, and the barrel effect is avoided to the maximum extent.

In the traditional detection method, each image spot needs to be judged and compared with other image spots once in a spatial database, namely, full traversal is performed, each image spot needs to be compared with all image spot data except the image spot data, parallel calculation cannot be performed under a centralized database, traversal judgment is very time-consuming, data broadcasting is caused under a distributed database, and the time is more time-consuming. By using the method of the invention, each pattern spot only needs to be judged and compared with other pattern spots in the same space grid once, so that the times of judging and comparing required by each pattern spot are greatly reduced, and the pattern spots in the same grid are distributed on the same database sub-node, so that data broadcasting is not generated, and the calculation performance of each sub-node of the distributed database is fully exerted. Meanwhile, the overlapping or gap judgment is space calculation which is time-consuming calculation, the total space calculation times can be reduced, the distributed parallel calculation capability can be fully exerted, and the calculation efficiency of the problem can be greatly improved.

Further, step S4 includes: the space vector data in one space grid only needs to be judged and compared with other space vector data in the same space grid once, and the space vector data in the same space grid is distributed on the same database sub-node. In the traditional detection method, each pattern spot needs to be judged and compared with other pattern spots once in a spatial database, namely, full traversal is performed, each pattern spot needs to be compared with all pattern spot data except the pattern spot data, parallel calculation cannot be performed under a centralized database, traversal judgment is time-consuming, data broadcasting is caused under a distributed database, and the time is more time-consuming. By using the method of the invention, each pattern spot only needs to be judged and compared with other pattern spots in the same space grid once, so that the times of judging and comparing required by each pattern spot are greatly reduced, and the pattern spots in the same grid are distributed on the same database sub-node, so that data broadcasting is not generated, and the calculation performance of each sub-node of the distributed database is fully exerted. Meanwhile, the overlapping or gap judgment is space calculation which is time-consuming calculation, the total space calculation times can be reduced, the distributed parallel calculation capability can be fully exerted, and the calculation efficiency of the problem can be greatly improved.

The concrete case is as follows:

the national building pattern building data (72 ten thousand + surface elements) overlap inspection result;

phases	Time consuming
		Spatial mesh generation	9.3 seconds
Grid cropping + new data redistribution	23.5 seconds
		Overlay inspection	82 seconds
Total up to	114.8 seconds

And checking whether the data has the pattern spots overlapped by using arcgis, wherein the checking only takes 23 minutes without considering the establishment time of the topological relation.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A vector data overlapping or gap checking method based on space position segmentation is characterized by comprising the following steps:

step S1: importing space vector data into a distributed computing storage environment, and distributing the space vector data on each node by a distributed database in a Hash manner according to conventional data;

step S3: cutting space vector data by using a space grid, and recording an original element id to which the space vector data belongs and a space grid id to which the space vector data belongs after cutting; the cut space vector data are redistributed to each node of the distributed database according to the space grid id to which the space vector data belong, namely the vector data in the same space grid are stored on the same node;

step S4: and checking whether the space vector data in each space grid has overlap or gaps, wherein if the original element id of two vector data is the same, no check is made because the two vector data are formed by cutting and dividing one original data by the space grid.

2. The method for checking the overlap or gap of vector data segmented based on spatial position as claimed in claim 1, wherein the step S2 comprises the following sub-steps:

step S21: the space grid number p is approximately half of the total number m of the original space vector data, and p is m/2;

step S22: and generating p spatial grids according to the spatial range of the spatial vector data and the number p of the spatial grids, covering all the original spatial vector data, and numbering each spatial grid, namely each grid has a unique grid id.

3. The method for checking vector data overlap or gap based on spatial position segmentation as claimed in claim 1, wherein step S2 is as follows:

4. The method for checking vector data overlap or gap based on spatial position segmentation as claimed in claim 1, wherein step S4 further comprises: