CN114357062B - Vector data overlapping or gap checking method based on space position segmentation - Google Patents

Vector data overlapping or gap checking method based on space position segmentation Download PDF

Info

Publication number
CN114357062B
CN114357062B CN202110437020.5A CN202110437020A CN114357062B CN 114357062 B CN114357062 B CN 114357062B CN 202110437020 A CN202110437020 A CN 202110437020A CN 114357062 B CN114357062 B CN 114357062B
Authority
CN
China
Prior art keywords
vector data
space
grid
spatial
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110437020.5A
Other languages
Chinese (zh)
Other versions
CN114357062A (en
Inventor
汪云龙
燕瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Star Dipper Chengdu Information Technology Co ltd
Original Assignee
Star Dipper Chengdu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Star Dipper Chengdu Information Technology Co ltd filed Critical Star Dipper Chengdu Information Technology Co ltd
Priority to CN202110437020.5A priority Critical patent/CN114357062B/en
Publication of CN114357062A publication Critical patent/CN114357062A/en
Application granted granted Critical
Publication of CN114357062B publication Critical patent/CN114357062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of geographic information, and discloses a vector data overlapping or gap checking method based on space position segmentation. Therefore, during the integral inspection, the computing resources of each computing node are fully and simultaneously utilized, the data broadcasting is avoided, the barrel effect under the distributed computing is avoided, the computing bottleneck is broken through, and the inspection efficiency is greatly improved.

Description

Vector data overlapping or gap checking method based on space position segmentation
Technical Field
The invention belongs to the technical field of geographic information, and particularly relates to a vector data overlapping or gap checking method based on spatial position segmentation.
Background
The space vector data of the geographic information industry often needs topology inspection, and some topology inspection comprises overlapping and gap inspection among the image spots (space vector surface data) (namely, the image spots cannot be overlapped or have a gap phenomenon). For example, in "three-tone of land and earth" and "general survey of geographical national conditions", the map patches of land type, administrative regions, etc. for the current state of land use survey are not overlapped or gapped between these map patches, and the overlap and gap check of these space vector data is required in production and management. However, the data amount is often huge, for example, in the province of Sichuan, tens of thousands to hundreds of thousands of map patches in a county are data, and the ease of a city reaches the level of millions, so that the data amount is huge in a province or even a country, and the spatial data is more complex than the conventional data, so that the data itself is huge. At present, tap products (such as arcgis) in the industry are used for carrying out overlapping gap inspection on the data, the data can be accepted under the condition of small data quantity, the inspection is time-consuming under the condition of huge data quantity, and even a common machine cannot calculate the result.
In future, projects such as emergency fields (such as real-time inspection of disaster relief coverage) and traffic scheduling based on geographic information inevitably put higher requirements on timeliness of the inspection, and the inspection rate must be increased. At present, the existing gis software is low in inspection efficiency, and especially under the condition that the original data size is large (for example, the arcgis performs overlapping inspection on data of 72 ten thousand image spots, and only the inspection time needs more than 20 minutes), the requirement of high efficiency is difficult to achieve. In addition, in the distributed computing storage environment, only this environment is used, and since each patch needs to be judged or calculated with other patches in the overlay or gap check, a data broadcast phenomenon occurs in the distributed computing storage environment, and the result is slower than the calculation using the centralized single-machine environment.
In view of the above-mentioned drawbacks of the prior art, a method for inspecting vector data overlap or gap based on spatial position segmentation is provided.
Disclosure of Invention
The invention aims to provide a vector data overlapping or gap checking method based on space position segmentation, which is used for solving one of the technical problems in the prior art, such as: at present, the existing gis software is low in inspection efficiency, and especially under the condition that the original data size is large (for example, the arcgis performs overlapping inspection on data of 72 ten thousand image spots, and only the inspection time needs more than 20 minutes), the requirement of high efficiency is difficult to achieve. In addition, in the distributed computing storage environment, only this environment is used, and since each patch needs to be judged or calculated with other patches in the overlay or gap check, a data broadcast phenomenon occurs in the distributed computing storage environment, and the result is slower than the calculation using the centralized single-machine environment.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a vector data overlapping or gap checking method based on space position segmentation comprises the following steps:
step S1: importing space vector data into a distributed computing storage environment, and distributing the space vector data on each node by a distributed database in a Hash manner according to conventional data; extracting target data characteristics corresponding to corresponding space vector data according to the historical space vector data samples, and taking the target data characteristics as reference standards; extracting actual data characteristics corresponding to actual space vector data from the actually imported space vector data; the method comprises the steps of carrying out matching judgment on actual data features and target data features through a main judgment unit, importing actual space vector data if the actual data features are matched with the target data features, carrying out matching judgment on the actual data features and the target data features through a secondary judgment unit if the actual data features are not matched with the target data features, importing the actual space vector data if the actual data features are matched with the target data features, and abandoning the actual space vector data if the actual data features are not matched with the target data features. The main judging unit and the auxiliary judging unit are judging units with the same function and are independent from each other.
Step S2: generating spatial grids for cutting the spatial vector data, wherein each spatial grid has a unique spatial grid id;
step S3: cutting space vector data by using a space grid, and recording original data id and space grid id of each cut vector data after cutting; the cut space vector data are redistributed to each node of the distributed database according to the space grid id, namely, the space vector data in the same space grid are stored in the same node;
step S4: and checking whether the space vector data in each space grid has overlap or gaps, wherein if the original data id of the two vector data are the same, the two vector data are not checked, because the two vector data are formed by cutting and dividing one original data by the space grid.
Further, step S2 includes the following sub-steps:
step S21: the number p of the space grids is approximately half of the total number m of the original elements, and p is m/2;
step S22: and generating p space grids according to the range of the space vector data and the number p of the space grids, covering all the space vector data, and numbering each space grid, namely each grid has a unique grid id.
Further, step S2 is specifically as follows:
when the space vector data are uniformly distributed on the space position, generating a uniform rectangular grid according to the range of the space vector data;
when the spatial vector data are unevenly distributed in the spatial position, the spatial grid is generated by using the data structure of the R + tree, and the R + tree is a balanced tree, so that the vector data in each rectangular grid can be ensured to be even in quantity.
Further, step S4 includes:
the space vector data in one space grid only needs to be judged and compared with other space vector data in the same space grid once, and the space vector data in the same space grid is distributed on the same database sub-node.
Compared with the prior art, the invention has the beneficial effects that:
one innovation point of the scheme is that the method provided by the invention can greatly improve the efficiency of checking the overlapping or gaps of the massive vector data. The method has the advantages that the original data are cut by utilizing the spatial grid (the new data are generated by physical cutting, the logical relationship is reserved, and the original data are not damaged), the new data are redistributed to each storage node according to the spatial position relationship, time-consuming data broadcasting is avoided during calculation, and the number of other elements required to be judged and calculated by each element is greatly reduced (each element only needs to be judged or calculated with other elements in the same grid, but not with all other elements once). Therefore, during the integral inspection, the computing resources of each computing node are fully and simultaneously utilized, the data broadcasting is avoided, the barrel effect under the distributed computing is avoided, the computing bottleneck is broken through, and the inspection efficiency is greatly improved.
Drawings
FIG. 1 is a flow chart illustrating exemplary steps of an embodiment of the present invention.
FIG. 2 is a schematic diagram of grid-cut space vector data according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to fig. 1-2 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example (b):
as shown in fig. 1, a method for checking vector data overlap or gap based on spatial position segmentation is proposed, which includes the following steps:
step S1: importing space vector data into a distributed computing storage environment, and distributing the space vector data on each node by a distributed database in a Hash manner according to conventional data; (Hash distribution: A unique distribution Key, typically a Primary Key, is assigned, the distribution Key assigns each row of data to a specific database child node using a hashing algorithm, and keys of the same value will always hash to the same child node, where the data is distributed more evenly across the child nodes.
The import of the space vector data is very critical, and if the space vector data imported at the beginning is error data or incomplete data, even if all subsequent data processing processes are correctly executed, the final result is an error result generated by a malfunction. It is crucial to check the import of space vector data.
Extracting target data characteristics corresponding to corresponding space vector data according to the historical space vector data samples, and taking the target data characteristics as reference standards; extracting actual data characteristics corresponding to actual space vector data from the actually imported space vector data; the method comprises the steps of carrying out matching judgment on actual data features and target data features through a main judgment unit, importing actual space vector data if the actual data features are matched with the target data features, carrying out matching judgment on the actual data features and the target data features through a secondary judgment unit if the actual data features are not matched with the target data features, importing the actual space vector data if the actual data features are matched with the target data features, and abandoning the actual space vector data if the actual data features are not matched with the target data features. The main judging unit and the auxiliary judging unit are judging units with the same function and are independent from each other.
Through the scheme, the space vector data is identified and judged, and is imported only when the actual space vector data is the current target space vector data, otherwise, the actual space vector data is discarded; avoiding subsequent false operations. Meanwhile, when the identification and judgment are carried out, the main judgment unit and the auxiliary judgment unit which are independent from each other are adopted for carrying out double judgment, so that the phenomenon that the misjudgment is carried out due to accidental errors of the main judgment unit, and further important target space vector data are lost is avoided.
Step S2: generating a plurality of spatial grids for cutting the spatial vector data, wherein each spatial grid has a unique spatial grid id;
step S3: cutting space vector data by using a space grid, and recording an original element id to which the space vector data belongs and a space grid id to which the space vector data belongs after cutting; according to the spatial grid id to which the spatial vector data belong, redistributing the spatial vector data to each node of the distributed database, namely storing the spatial vector data in the same spatial grid on the same node; as shown in fig. 2, the spot 7 is divided into 4 blocks by a mesh, and the mesh ids assigned to the 4 blocks of new elements are different, but the original element ids assigned to the 4 blocks are the same and are all 7. Wherein, evaluating the uniformity of spatial distribution of the original space vector data: according to the original data space range, a quadtree space index with the depth of k is generated (k generally takes the best value between empirical values 4 and 7), the number of leaf nodes of the generated quadtree is T, the number Ni of vector data under each leaf node is calculated (Ni is the number of vector data under the ith leaf node), then the average number P of vector data under each leaf node is obtained, and finally the evaluation value E of the uniformity degree of the spatial distribution of the whole vector data is obtained (E is Σ Xi/T) after the data distribution deviation value Xi (Xi is Ni/P) of each leaf node is calculated. According to the evaluation value E of the uniformity of the spatial distribution, a reference basis can be provided for generating a spatial grid by adopting a later method.
Step S4: and checking whether the space vector data in each space grid has overlap or gaps, wherein if the original element id of two vector data is the same, no check is carried out, because the two vector data are formed by cutting and dividing one original element by the space grid. It is only necessary to check whether there are heavy or gap between the elements in each spatial grid to ensure that there is overlap or gap between all the patches.
And (3) analysis: the main steps that affect the time complexity of the algorithm,
Figure BDA0003033551360000051
from the above, the number of grids cannot be too large or too small. However, when the method is applied to spatial data, a rectangular grid is a simpler surface (5 points constitute, and the starting point and the end point have the same coordinate), and the original data is actual spatial vector data, such as a map spot of a land type (e.g., a lake surface), which is often described by a plurality of points, and is often described by dozens of points, or hundreds or thousands of points, so that the judgment and calculation times of the original elements and the original elements are the main calculation bottleneck, and therefore, the number of the elements in each grid needs to be reduced as much as possible, that is, the number of the elements in each grid needs to be reduced as much as possible. Theoretically, only two optimal grids exist in each grid, the number p of the space grids is approximately half of the total number m of the original elements, and p is m/2; and generating p spatial grids according to the range of the spatial vector data and the number p of the spatial grids, covering the range of the spatial vector data, and numbering each spatial grid, namely each grid has a unique grid id.
The generation of the spatial grid has other methods, the uniform rectangular grid is produced according to the original data spatial range, the method is simplest, when the original data are distributed on the spatial position very unevenly, the element difference included by a plurality of empty grids or partial grids is too large, and in this way, the grid can be generated by using the data structure of the R + tree, the empty grids can be ensured to be absent, the element quantity in each grid can be balanced, but the time is consumed compared with the simple generation of the uniform rectangular grid, but the subsequent calculation can more fully utilize the node resources, and the barrel effect is avoided to the maximum extent.
In the traditional detection method, each image spot needs to be judged and compared with other image spots once in a spatial database, namely, full traversal is performed, each image spot needs to be compared with all image spot data except the image spot data, parallel calculation cannot be performed under a centralized database, traversal judgment is very time-consuming, data broadcasting is caused under a distributed database, and the time is more time-consuming. By using the method of the invention, each pattern spot only needs to be judged and compared with other pattern spots in the same space grid once, so that the times of judging and comparing required by each pattern spot are greatly reduced, and the pattern spots in the same grid are distributed on the same database sub-node, so that data broadcasting is not generated, and the calculation performance of each sub-node of the distributed database is fully exerted. Meanwhile, the overlapping or gap judgment is space calculation which is time-consuming calculation, the total space calculation times can be reduced, the distributed parallel calculation capability can be fully exerted, and the calculation efficiency of the problem can be greatly improved.
Further, step S4 includes: the space vector data in one space grid only needs to be judged and compared with other space vector data in the same space grid once, and the space vector data in the same space grid is distributed on the same database sub-node. In the traditional detection method, each pattern spot needs to be judged and compared with other pattern spots once in a spatial database, namely, full traversal is performed, each pattern spot needs to be compared with all pattern spot data except the pattern spot data, parallel calculation cannot be performed under a centralized database, traversal judgment is time-consuming, data broadcasting is caused under a distributed database, and the time is more time-consuming. By using the method of the invention, each pattern spot only needs to be judged and compared with other pattern spots in the same space grid once, so that the times of judging and comparing required by each pattern spot are greatly reduced, and the pattern spots in the same grid are distributed on the same database sub-node, so that data broadcasting is not generated, and the calculation performance of each sub-node of the distributed database is fully exerted. Meanwhile, the overlapping or gap judgment is space calculation which is time-consuming calculation, the total space calculation times can be reduced, the distributed parallel calculation capability can be fully exerted, and the calculation efficiency of the problem can be greatly improved.
The concrete case is as follows:
the national building pattern building data (72 ten thousand + surface elements) overlap inspection result;
phases Time consuming
Spatial mesh generation 9.3 seconds
Grid cropping + new data redistribution 23.5 seconds
Overlay inspection 82 seconds
Total up to 114.8 seconds
And checking whether the data has the pattern spots overlapped by using arcgis, wherein the checking only takes 23 minutes without considering the establishment time of the topological relation.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (4)

1. A vector data overlapping or gap checking method based on space position segmentation is characterized by comprising the following steps:
step S1: importing space vector data into a distributed computing storage environment, and distributing the space vector data on each node by a distributed database in a Hash manner according to conventional data;
step S2: generating a plurality of spatial grids for cutting the spatial vector data, wherein each spatial grid has a unique spatial grid id;
step S3: cutting space vector data by using a space grid, and recording an original element id to which the space vector data belongs and a space grid id to which the space vector data belongs after cutting; the cut space vector data are redistributed to each node of the distributed database according to the space grid id to which the space vector data belong, namely the vector data in the same space grid are stored on the same node;
step S4: and checking whether the space vector data in each space grid has overlap or gaps, wherein if the original element id of two vector data is the same, no check is made because the two vector data are formed by cutting and dividing one original data by the space grid.
2. The method for checking the overlap or gap of vector data segmented based on spatial position as claimed in claim 1, wherein the step S2 comprises the following sub-steps:
step S21: the space grid number p is approximately half of the total number m of the original space vector data, and p is m/2;
step S22: and generating p spatial grids according to the spatial range of the spatial vector data and the number p of the spatial grids, covering all the original spatial vector data, and numbering each spatial grid, namely each grid has a unique grid id.
3. The method for checking vector data overlap or gap based on spatial position segmentation as claimed in claim 1, wherein step S2 is as follows:
when the space vector data are uniformly distributed on the space position, generating a uniform rectangular grid according to the range of the space vector data;
when the spatial vector data are unevenly distributed in the spatial position, the spatial grid is generated by using the data structure of the R + tree, and the R + tree is a balanced tree, so that the vector data in each rectangular grid can be ensured to be even in quantity.
4. The method for checking vector data overlap or gap based on spatial position segmentation as claimed in claim 1, wherein step S4 further comprises:
the space vector data in one space grid only needs to be judged and compared with other space vector data in the same space grid once, and the space vector data in the same space grid is distributed on the same database sub-node.
CN202110437020.5A 2021-04-22 2021-04-22 Vector data overlapping or gap checking method based on space position segmentation Active CN114357062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110437020.5A CN114357062B (en) 2021-04-22 2021-04-22 Vector data overlapping or gap checking method based on space position segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110437020.5A CN114357062B (en) 2021-04-22 2021-04-22 Vector data overlapping or gap checking method based on space position segmentation

Publications (2)

Publication Number Publication Date
CN114357062A CN114357062A (en) 2022-04-15
CN114357062B true CN114357062B (en) 2022-08-02

Family

ID=81095429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110437020.5A Active CN114357062B (en) 2021-04-22 2021-04-22 Vector data overlapping or gap checking method based on space position segmentation

Country Status (1)

Country Link
CN (1) CN114357062B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01224881A (en) * 1988-03-04 1989-09-07 Toshiba Mach Co Ltd Pattern inspecting device
KR100209349B1 (en) * 1996-05-27 1999-07-15 이계철 Electronic map
CN107507117B (en) * 2017-08-16 2021-01-29 国家基础地理信息中心 Method and system for rapidly checking mass ground surface coverage gaps
CN108090150B (en) * 2017-12-11 2020-12-15 厦门亿力吉奥信息科技有限公司 GIS space object storage method and system
CN108629001A (en) * 2018-05-03 2018-10-09 成都瀚涛天图科技有限公司 A kind of De-weight method of geography information big data
CN112215864B (en) * 2020-11-05 2022-08-30 腾讯科技(深圳)有限公司 Contour processing method and device of electronic map and electronic equipment
CN112396128B (en) * 2020-12-08 2022-06-03 中国铁路设计集团有限公司 Automatic labeling method for railway external environment risk source sample

Also Published As

Publication number Publication date
CN114357062A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
US11916380B2 (en) Maintaining connectivity information for meters and transformers located in a power distribution network
CN106452825B (en) A kind of adapted telecommunication net alarm correlation analysis method based on improvement decision tree
Tao et al. FlowAMOEBA: Identifying regions of anomalous spatial interactions
CN107944036B (en) Method for acquiring map change difference
CN115563477B (en) Harmonic data identification method, device, computer equipment and storage medium
CN113723810A (en) Graph database-based power grid modeling method
CN109614454A (en) A kind of vector big data parallel spatial Overlap Analysis method based on MPI
CN111294841A (en) Method and device for processing wireless network problem and storage medium
CN114357062B (en) Vector data overlapping or gap checking method based on space position segmentation
CN114978684A (en) PBFT consensus method based on improved agglomeration type hierarchical clustering
CN112887910B (en) Method and device for determining abnormal coverage area and computer readable storage medium
CN112487053B (en) Abnormal control extraction working method for mass financial data
CN110807014B (en) Cross validation based station data anomaly discrimination method and device
CN109768878B (en) Network work order calculation method and device based on big data
CN113220748A (en) Method and system for constructing distribution network equipment load thermodynamic diagram and analyzing data
CN113255593A (en) Sensor information anomaly detection method facing space-time analysis model
CN111414582A (en) Photovoltaic theoretical power calculation method, device, equipment and storage medium
CN116662466B (en) Land full life cycle maintenance system through big data
Montano-Martinez et al. Automated Correction of GIS Data for Loads and Distributed Energy Resources in Secondary Distribution Networks
CN113792045B (en) Mixed storage method for ecological damage problem data of consultation ecological protection red line area
US20220228483A1 (en) Systems and methods for updating reservoir static models
Li et al. Small‐area patch‐merging method accounting for both local constraints and the overall area balance
Ji et al. A Methodology for Identifying Urban Traffic Corridors Using Big Data of Cell Phone Trajectory
CN118093529A (en) Power grid data type labeling method and device
CN117354842A (en) Network quality difference root cause positioning method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant